Import pymupdf_1.17.4+ds1.orig.tar.xz

author Bastian Germann <bastiangermann@fishpost.de>

Fri, 7 Aug 2020 11:03:11 +0000 (12:03 +0100)

committer Bastian Germann <bastiangermann@fishpost.de>

Fri, 7 Aug 2020 11:03:11 +0000 (12:03 +0100)
author Bastian Germann <bastiangermann@fishpost.de>
Fri, 7 Aug 2020 11:03:11 +0000 (12:03 +0100)
committer Bastian Germann <bastiangermann@fishpost.de>
Fri, 7 Aug 2020 11:03:11 +0000 (12:03 +0100)
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md

new file mode 100644 (file)

index 0000000..e89c89c
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,32 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: bug
+assignees: JorjMcKie
+
+---
+
+_**Please provide all mandatory information!**_
+
+## Describe the bug (mandatory)
+A clear and concise description of what the bug is.
+
+## To Reproduce (mandatory)
+Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc.
+
+## Expected behavior (optional)
+Describe what you expected to happen (if not obvious).
+
+## Screenshots (optional)
+If applicable, add screenshots to help explain your problem.
+
+## Your configuration (mandatory)
+ - Operating system, potentially version and bitness
+ - Python version, bitness
+ - PyMuPDF version, installation method (**wheel** or **generated** from source).
+
+For example, the output of `print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)` would be sufficient (for the first two bullets).
+
+## Additional context (optional)
+Add any other context about the problem here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md

new file mode 100644 (file)

index 0000000..f5ed2dc
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: enhancement
+assignees: JorjMcKie
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Potentially add an issue reference.
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+Are there several options for how your request could be met?
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/.github/ISSUE_TEMPLATE/general-purpose.md b/.github/ISSUE_TEMPLATE/general-purpose.md

new file mode 100644 (file)

index 0000000..59dd59c
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/general-purpose.md
@@ -0,0 +1,10 @@
+---
+name: General Purpose
+about: Use this form for questions, comments, etc.
+title: 'Question / Comment:'
+labels: question
+assignees: JorjMcKie
+
+---
+
+
diff --git a/.gitignore b/.gitignore

new file mode 100644 (file)

index 0000000..e55e79d
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,6 @@
+*.pyc
+*.so
+*.o
+*.swp
+build/
+demo/README.rst
diff --git a/.vs/ProjectSettings.json b/.vs/ProjectSettings.json

new file mode 100644 (file)

index 0000000..d33b3d6
--- /dev/null
+++ b/.vs/ProjectSettings.json
@@ -0,0 +1,3 @@
+{
+  "CurrentProjectSetting": "Keine Konfigurationen"
+}
+\ No newline at end of file
diff --git a/.vs/PyMuPDF/v15/.suo b/.vs/PyMuPDF/v15/.suo

new file mode 100644 (file)

index 0000000..f061189

Binary files /dev/null and b/.vs/PyMuPDF/v15/.suo differ
diff --git a/.vs/PyMuPDF/v15/Browse.VC.db b/.vs/PyMuPDF/v15/Browse.VC.db

new file mode 100644 (file)

index 0000000..ccb14f9

Binary files /dev/null and b/.vs/PyMuPDF/v15/Browse.VC.db differ
diff --git a/.vs/VSWorkspaceState.json b/.vs/VSWorkspaceState.json

new file mode 100644 (file)

index 0000000..d282b3b
--- /dev/null
+++ b/.vs/VSWorkspaceState.json
@@ -0,0 +1,7 @@
+{
+  "ExpandedNodes": [
+    ""
+  ],
+  "SelectedNode": "\\README.md",
+  "PreviewInSolutionExplorer": false
+}
+\ No newline at end of file
diff --git a/.vs/slnx.sqlite b/.vs/slnx.sqlite

new file mode 100644 (file)

index 0000000..98aef7b

Binary files /dev/null and b/.vs/slnx.sqlite differ
diff --git a/COPYING b/COPYING

new file mode 100644 (file)

index 0000000..94a9ed0
--- /dev/null
+++ b/COPYING
@@ -0,0 +1,674 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/GNU AFFERO GPL V3 b/GNU AFFERO GPL V3

new file mode 100644 (file)

index 0000000..dba13ed
--- /dev/null
+++ b/GNU AFFERO GPL V3
@@ -0,0 +1,661 @@
+                    GNU AFFERO GENERAL PUBLIC LICENSE
+                       Version 3, 19 November 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU Affero General Public License is a free, copyleft license for
+software and other kinds of works, specifically designed to ensure
+cooperation with the community in the case of network server software.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+our General Public Licenses are intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  Developers that use our General Public Licenses protect your rights
+with two steps: (1) assert copyright on the software, and (2) offer
+you this License which gives you legal permission to copy, distribute
+and/or modify the software.
+
+  A secondary benefit of defending all users' freedom is that
+improvements made in alternate versions of the program, if they
+receive widespread use, become available for other developers to
+incorporate.  Many developers of free software are heartened and
+encouraged by the resulting cooperation.  However, in the case of
+software used on network servers, this result may fail to come about.
+The GNU General Public License permits making a modified version and
+letting the public access it on a server without ever releasing its
+source code to the public.
+
+  The GNU Affero General Public License is designed specifically to
+ensure that, in such cases, the modified source code becomes available
+to the community.  It requires the operator of a network server to
+provide the source code of the modified version running there to the
+users of that server.  Therefore, public use of a modified version, on
+a publicly accessible server, gives the public access to the source
+code of the modified version.
+
+  An older license, called the Affero General Public License and
+published by Affero, was designed to accomplish similar goals.  This is
+a different license, not a version of the Affero GPL, but Affero has
+released a new version of the Affero GPL which permits relicensing under
+this license.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU Affero General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Remote Network Interaction; Use with the GNU General Public License.
+
+  Notwithstanding any other provision of this License, if you modify the
+Program, your modified version must prominently offer all users
+interacting with it remotely through a computer network (if your version
+supports such interaction) an opportunity to receive the Corresponding
+Source of your version by providing access to the Corresponding Source
+from a network server at no charge, through some standard or customary
+means of facilitating copying of software.  This Corresponding Source
+shall include the Corresponding Source for any work covered by version 3
+of the GNU General Public License that is incorporated pursuant to the
+following paragraph.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the work with which it is combined will remain governed by version
+3 of the GNU General Public License.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU Affero General Public License from time to time.  Such new versions
+will be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU Affero General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU Affero General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU Affero General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If your software can interact with users remotely through a computer
+network, you should also make sure that it provides a way for users to
+get its source.  For example, if your program is a web application, its
+interface could display a "Source" link that leads users to an archive
+of the code.  There are many ways you could offer source, and different
+solutions will be better for different programs; see section 13 for the
+specific requirements.
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU AGPL, see
+<http://www.gnu.org/licenses/>.
diff --git a/PKG-INFO b/PKG-INFO

new file mode 100644 (file)

index 0000000..e848787
--- /dev/null
+++ b/PKG-INFO
@@ -0,0 +1,80 @@
+Metadata-Version: 1.1
+Name: PyMuPDF
+Version: 1.17.4
+Author: Ruikai Liu
+Author-email: lrk700@gmail.com
+Maintainer: Jorj X. McKie
+Maintainer-email: jorj.x.mckie@outlook.de
+Home-page: https://github.com/pymupdf/PyMuPDF
+Download-url: https://github.com/pymupdf/PyMuPDF
+Summary: PyMuPDF is a Python binding for the PDF rendering library MuPDF
+Description:
+        Release date: July 31, 2020
+        
+        Authors
+        =======
+        
+        * Jorj X. McKie
+        * Ruikai Liu
+        
+        Introduction
+        ============
+        
+        This is **version 1.17.4 of PyMuPDF**, a Python binding for `MuPDF <http://mupdf.com/>`_ - "a lightweight PDF and XPS viewer".
+        
+        MuPDF can access files in PDF, XPS, OpenXPS, epub, comic and fiction book formats, and it is known for both, its top performance and high rendering quality.
+        
+        With PyMuPDF you therefore can access files with extensions ``*.pdf``, ``*.xps``, ``*.oxps``, ``*.epub``, ``*.cbz`` or ``*.fb2`` from your Python scripts. A number of popular image formats is supported as well, including multi-page TIFF images.
+        
+        PyMuPDF should run on all platforms that are supported by both, MuPDF and Python. These include, but are not limited to, Windows (XP/SP2 and up), Mac OSX and Linux, 32-bit or 64-bit. If you can generate MuPDF on a Python supported platform, then also PyMuPDF can be used there.
+        
+        PyMuPDF is hosted on `GitHub <https://github.com/pymupdf/PyMuPDF>`_ where you find up-to-date information of its features, our `issue tracker <https://github.com/pymupdf/PyMuPDF/issues>`_, `Wikis <https://github.com/pymupdf/PyMuPDF/wiki>`_ and much more.
+        
+        Installation
+        ============
+        
+        For all MS Windows versions as well as popular Max OSX and Linux versions, we are providing Python wheels - see the download section of this site and the current `release directory <https://github.com/pymupdf/PyMuPDF/releases/latest>`_ of our home page. Other platforms need to download and generate the MuPDF library first and then set up PyMuPDF. Do visit our GitHub home, which has more details on this, including latest bugfixes, pre-releases, etc.
+        
+        Usage and Documentation
+        ========================
+        
+        For all document types you can render pages in raster (PNG) or vector (SVG) formats, extract text and access meta information, links, annotations and bookmarks, as well as decrypt the document. For PDF files, these objects can also be created, modified or deleted. Plus you can rotate, re-arrange, duplicate, create, or delete pages and join or split documents.
+        
+        Starting with version 1.16.0, PDF password protection is **fully supported**: passwords, encryption methods and permission levels can be set, changed or removed.
+        
+        Specifically for PDF files, PyMuPDF provides update access to low-level structure information, supports handling of embedded files and modification of page contents (like inserting images, fonts, text, annotations and drawings).
+        
+        Other features include embedding vector images (SVG, PDF) such as logos or watermarks, joining or splitting single PDF pages (including things like posterizing and 2-up / 4-up processing).
+        
+        You can also create **PDF Form fields** with support for text, checkbox, listbox and combobox widgets.
+        
+        Our home page provides many examples and How-Tos for all of this. At a minimum, read the tutorial and the the recipes sections of our documentation.
+        
+        Written using **Sphinx**, documentation is available here:
+        
+        * View it online at `Read The Docs <https://pymupdf.readthedocs.io/en/latest/>`_. For **best quality downloads**, use the following links.
+        
+        * `HTML <https://github.com/pymupdf/PyMuPDF/tree/master/doc/html.zip>`_
+        
+        * `Windows CHM <https://github.com/JorjMcKie/PyMuPDF-optional-material/tree/master/doc/PyMuPDF.chm>`_
+        
+        * `PDF <https://github.com/pymupdf/PyMuPDF/tree/master/doc/pymupdf.pdf>`_
+        
+
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
+Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
+Classifier: Operating System :: MacOS
+Classifier: Operating System :: Microsoft :: Windows
+Classifier: Operating System :: POSIX :: Linux
+Classifier: Programming Language :: C
+Classifier: Programming Language :: Python :: 2.7
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.4
+Classifier: Programming Language :: Python :: 3.5
+Classifier: Programming Language :: Python :: 3.6
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Topic :: Utilities
diff --git a/README.md b/README.md

new file mode 100644 (file)

index 0000000..084c078
--- /dev/null
+++ b/README.md
@@ -0,0 +1,106 @@
+# PyMuPDF 1.17.4
+
+![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg)
+
+Release date: July 31, 2020
+
+**Travis-CI:** [![Build Status](https://travis-ci.org/JorjMcKie/py-mupdf.svg?branch=master)](https://travis-ci.org/JorjMcKie/py-mupdf)
+
+On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![](https://pepy.tech/badge/pymupdf)](https://pepy.tech/project/pymupdf)
+
+# Authors
+* [Jorj X. McKie](mailto:jorj.x.mckie@outlook.de)
+* [Ruikai Liu](mailto:lrk700@gmail.com)
+
+# Introduction
+
+This is **version 1.17.4 of PyMuPDF**, a Python binding with support for [MuPDF 1.17.*](http://mupdf.com/) - "a lightweight PDF, XPS, and E-book viewer".
+
+MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
+
+With PyMuPDF you can access files with extensions like ".pdf", ".xps", ".oxps", ".cbz", ".fb2" or ".epub". In addition, about 10 popular image formats can also be opened and handled like documents.
+
+
+# Usage and Documentation
+For all supported document types (i.e. **_including images_**) you can
+* decrypt the document
+* access meta information, links and bookmarks
+* render pages in raster formats (PNG and some others), or the vector format SVG
+* search for text
+* extract text and images
+* convert to other formats: PDF, (X)HTML, XML, JSON, text
+
+> To some degree, PyMuPDF can therefore be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.
+
+**PDF documents** can be created, joined or split up. Pages can be inserted, deleted, re-arranged or modified in many ways (including annotations and form fields).
+
+* Images and fonts can be extracted or inserted.
+* Embedded files are fully supported.
+* PDFs can be reformatted to support double-sided printing, posterizing, applying logos or watermarks
+* Password protection is fully supported: decryption, encryption, encryption method selection, permmission level and user / owner password setting.
+* Low-level PDF structures can be accessed and modified.
+* PyMuPDF can also be used as a **module in the command line** using ``"python -m fitz ..."``. This is a versatile utility, which we will further develop going forward. It currently supports PDF document
+
+    - **encryption / decryption / optimization**
+    - creating **sub-documents**
+    - document **joining**
+    - **image / font extraction**
+    - full support of **embedded files**.
+
+
+Have a look at the basic [demos](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo), the [examples](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples) (which contain complete, working programs), and the **recipes** section of our [Wiki](https://github.com/pymupdf/PyMuPDF/wiki) sidebar, which contains more than a dozen of guides in How-To-style.
+
+Our **documentation**, written using Sphinx, is available in various formats from the following sources. It currently is a combination of a reference guide and a user manual. For a **quick start** look at the [tutorial](https://pymupdf.readthedocs.io/en/latest/tutorial/) and the [recipes](https://pymupdf.readthedocs.io/en/latest/faq/) chapters.
+
+* You can view it online at [Read the Docs](https://readthedocs.org/projects/pymupdf/). This site also provides download options for zipped HTML and PDF.
+* Find a Windows help file [here](https://github.com/pymupdf/PyMuPDF-optional-material/tree/master/doc/PyMuPDF.chm).
+
+
+# Installation
+
+For the major **Windows** and (thanks to our user **@jbarlow83**!) **Mac OSX** or **Linux** versions we offer wheels in the [download section of PyPI](https://pypi.org/project/PyMuPDF/#files). This includes Python 2.7 and version Python 3.5 through 3.8.
+
+For other Python versions or operating systems you need to generate PyMuPDF yourself as follows. This should work for all platforms which support Python and MuPDF. In any case you need the development version of Python.
+
+To do this, you must download and generate MuPDF. This process depends very much on your system. For most platforms, the MuPDF source contains prepared procedures for achieving this. Please observe the following general steps:
+
+* Be sure to download the official MuPDF source release from [here](https://mupdf.com/downloads/archive).
+
+* Do **not use** MuPDF's [GitHub repo](https://github.com/ArtifexSoftware/mupdf). It contains their current **development source**, which is **not compatible** with this PyMuPDF version.
+
+* This repo's `fitz` folder contains one or more files whose names start with a single underscore `"_"`. These files contain configuration data and hotfixes. Each one must be copy-renamed to its correct target location **inside the MuPDF source** that you have downloaded, **before you generate MuPDF**. Currently, these files are:
+  - fitz configuration file `_config.h` copy-replace to: `mupdf/include/mupdf/fitz/config.h`. It contains configuration data like e.g. which fonts to support.
+
+  - Now MuPDF can be generated.
+
+* Since PyMuPDF v1.14.17, the sources provided in this repository **no longer contain** the interface files ``fitz.py`` and ``fitz.wrap.c`` - they are instead generated **"on the fly"** by ``setup.py`` using the interface generator [SWIG](http://www.swig.org/). So you need SWIG being installed on your system. Please refer to issue #312 for some background.
+    - PyMuPDF wheels have been generated using **SWIG v4.0.1**.
+
+
+* If you do **not use SWIG**, please download the **sources from PyPI** - they continue to contain those generated files, so installation should work like any other Python extension generation on your system.
+
+Once this is done, adjust directories in ``setup.py`` and run ``python setup.py install``.
+
+The following sections contain further comments for some platforms.
+
+## Ubuntu
+Our users (thanks to **@gileadslostson** and **@jbarlow83**!) have documented their MuPDF installation experiences from sources in this [Wiki page](https://github.com/pymupdf/PyMuPDF/wiki/Ubuntu-Installation-Experience).
+
+## OSX
+First, install the MuPDF headers and libraries, which are provided by mupdf-tools: ``brew install mupdf-tools``.
+
+Then you might need to ``export ARCHFLAGS='-arch x86_64'``, since ``libmupdf.a`` is for x86_64 only.
+
+Finally, please double check ``setup.py`` before building. Update ``include_dirs`` and ``library_dirs`` if necessary.
+
+## MS Windows
+If you are looking to make your own binary, consult this [Wiki page](https://github.com/pymupdf/PyMuPDF/wiki/Windows-Binaries-Generation). It explains how to use Visual Studio for generating MuPDF in quite some detail.
+
+# Earlier Versions
+Earlier versions are available in the [releases](https://github.com/pymupdf/PyMuPDF/releases) directory.
+
+# License
+PyMuPDF is distributed under GNU GPL V3. Because you will implicitely also be using MuPDF, its license GNU AFFERO GPL V3 applies as well. Copies of both are included in this repository.
+
+# Contact
+Please submit questions, comments or issues [here](https://github.com/pymupdf/PyMuPDF/issues), or directly contact the authors via their e-mail addresses.
diff --git a/demo/pymupdf.jpg b/demo/pymupdf.jpg

new file mode 100644 (file)

index 0000000..184a2d6

Binary files /dev/null and b/demo/pymupdf.jpg differ
diff --git a/docs/PyMuPDF.ico b/docs/PyMuPDF.ico

new file mode 100644 (file)

index 0000000..db1809d

Binary files /dev/null and b/docs/PyMuPDF.ico differ
diff --git a/docs/algebra.rst b/docs/algebra.rst

new file mode 100644 (file)

index 0000000..bc38195
--- /dev/null
+++ b/docs/algebra.rst
@@ -0,0 +1,199 @@
+.. _Algebra:
+
+Operator Algebra for Geometry Objects
+======================================
+
+.. highlight:: python
+
+Instances of classes :ref:`Point`, :ref:`IRect`, :ref:`Rect` and :ref:`Matrix` are collectively also called "geometry" objects.
+
+They all are special cases of Python sequences, see :ref:`SequenceTypes` for more background.
+
+We have defined operators for these classes that allow dealing with them (almost) like ordinary numbers in terms of addition, subtraction, multiplication, division, and some others.
+
+This chapter is a synopsis of what is possible.
+
+General Remarks
+-----------------
+1. Operators can be either **binary** (i.e. involving two objects) or **unary**.
+
+2. The resulting type of **binary** operations is either a **new object of the left operand's class** or a bool.
+
+3. The result of **unary** operations is either a **new object** of the same class, a bool or a float.
+
+4. The binary operators *+, -, *, /* are defined for all classes. They *roughly* do what you would expect -- **except, that the second operand ...**
+
+    - may always be a number which then performs the operation on every component of the first one,
+    - may always be a numeric sequence of the same length (2, 4 or 6) -- we call such sequences :data:`point_like`, :data:`rect_like` or :data:`matrix_like`, respectively.
+
+5. Rectangles support additional binary operations: **intersection** (operator *"&"*), **union** (operator *"|"*) and **containment** checking.
+
+6. Binary operators fully support in-place operations, so expressions like *"a /= b"* are valid if b is numeric or "a_like".
+
+
+Unary Operations
+------------------
+
+=========== ===================================================================
+Oper.       Result
+=========== ===================================================================
+ bool(OBJ)  is false exactly if all components of OBJ are zero
+ abs(OBJ)   the rectangle area -- equal to norm(OBJ) for the other tyes
+ norm(OBJ)  square root of the component squares (Euclidean norm)
+ +OBJ       new copy of OBJ
+ -OBJ       new copy of OBJ with negated components
+ ~m         inverse of matrix "m", or the null matrix if not invertible
+=========== ===================================================================
+
+
+Binary Operations
+------------------
+For every geometry object "a" and every number "b", the operations "a ° b" and "a °= b" are always defined for the operators *+, -, *, /*. The respective operation is simply executed for each component of "a". If the **second operand is not a number**, then the following is defined:
+
+========= =======================================================================
+Oper.     Result
+========= =======================================================================
+a+b, a-b  component-wise execution, "b" must be "a-like".
+a*m, a/m  "a" can be a point, rectangle or matrix, but "m" must be
+          :data:`matrix_like`. *"a/m"* is treated as *"a*~m"* (see note below
+          for non-invertible matrices). If "a" is a **point** or a **rectangle**,
+          then *"a.transform(m)"* is executed. If "a" is a matrix, then
+          matrix concatenation takes place.
+a&b       **intersection rectangle:** "a" must be a rectangle and
+          "b" :data:`rect_like`. Delivers the **largest rectangle**
+          contained in both operands.
+a|b       **union rectangle:** "a" must be a rectangle, and "b" may be
+          :data:`point_like` or :data:`rect_like`.
+          Delivers the **smallest rectangle** containing both operands.
+b in a    if "b" is a number, then *"b in tuple(a)"* is returned.
+          If "b" is :data:`point_like` or :data:`rect_like`, then "a"
+          must be a rectangle, and *"a.contains(b)"* is returned.
+a == b    *True* if *bool(a-b)* is *False* ("b" may be "a-like").
+========= =======================================================================
+
+
+.. note:: Please note an important difference to usual arithmetics:
+
+        Matrix multiplication is **not commutative**, i.e. in general we have *m*n != n*m* for two matrices. Also, there are non-zero matrices which have no inverse, for example *m = Matrix(1, 0, 1, 0, 1, 0)*. If you try to divide by any of these you will receive a *ZeroDivisionError* exception using operator *"/"*, e.g. for *fitz.Identity / m*. But if you formulate *fitz.Identity * ~m*, the result will be *fitz.Matrix()* (the null matrix).
+
+        Admittedly, this represents an inconsistency, and we are considering to remove it. For the time being, you can choose to avoid an exception and check whether ~m is the null matrix, or accept a potential *ZeroDivisionError* by using *fitz.Identity / m*.
+
+
+Some Examples
+--------------
+
+Manipulation with numbers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For the usual arithmetic operations, numbers are always allowed as second operand. In addition, you can formulate *"x in OBJ"*, where x is a number. It is implemented as *"x in tuple(OBJ)"*::
+
+  >>> fitz.Rect(1, 2, 3, 4) + 5
+  fitz.Rect(6.0, 7.0, 8.0, 9.0)
+  >>> 3 in fitz.Rect(1, 2, 3, 4)
+  True
+  >>> 
+
+The following will create the upper left quarter of a document page rectangle::
+
+  >>> page.rect
+  Rect(0.0, 0.0, 595.0, 842.0)
+  >>> page.rect / 2
+  Rect(0.0, 0.0, 297.5, 421.0)
+  >>> 
+
+The following will deliver the **middle point of a line** connecting two points **p1** and **p2**::
+
+  >>> p1 = fitz.Point(1, 2)
+  >>> p2 = fitz.Point(4711, 3141)
+  >>> mp = p1 + (p2 - p1) / 2
+  >>> mp
+  Point(2356.0, 1571.5)
+  >>> 
+
+Manipulation with "like" Objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The second operand of a binary operation can always be "like" the left operand. "Like" in this context means "a sequence of numbers of the same length". With the above examples::
+
+  >>> p1 + p2
+  Point(4712.0, 3143.0)
+  >>> p1 + (4711, 3141)
+  Point(4712.0, 3143.0)
+  >>> p1 += (4711, 3141)
+  >>> p1
+  Point(4712.0, 3143.0)
+  >>> 
+
+To shift a rectangle for 5 pixels to the right, do this::
+
+  >>> fitz.Rect(100, 100, 200, 200) + (5, 0, 5, 0)  # add 5 to the x coordinates
+  Rect(105.0, 100.0, 205.0, 200.0)
+  >>>
+
+Points, rectangles and matrices can be *transformed* with matrices. In PyMuPDF, we treat this like a **"multiplication"** (or resp. **"division"**), where the second operand may be "like" a matrix. Division in this context means "multiplication with the inverted matrix"::
+
+  >>> m = fitz.Matrix(1, 2, 3, 4, 5, 6)
+  >>> n = fitz.Matrix(6, 5, 4, 3, 2, 1)
+  >>> p = fitz.Point(1, 2)
+  >>> p * m
+  Point(12.0, 16.0)
+  >>> p * (1, 2, 3, 4, 5, 6)
+  Point(12.0, 16.0)
+  >>> p / m
+  Point(2.0, -2.0)
+  >>> p / (1, 2, 3, 4, 5, 6)
+  Point(2.0, -2.0)
+  >>>
+  >>> m * n  # matrix multiplication
+  Matrix(14.0, 11.0, 34.0, 27.0, 56.0, 44.0)
+  >>> m / n  # matrix division
+  Matrix(2.5, -3.5, 3.5, -4.5, 5.5, -7.5)
+  >>>
+  >>> m / m  # result is equal to the Identity matrix
+  Matrix(1.0, 0.0, 0.0, 1.0, 0.0, 0.0)
+  >>>
+  >>> # look at this non-invertible matrix:
+  >>> m = fitz.Matrix(1, 0, 1, 0, 1, 0)
+  >>> ~m
+  Matrix(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
+  >>> # we try dividing by it in two ways:
+  >>> p = fitz.Point(1, 2)
+  >>> p * ~m  # this delivers point (0, 0):
+  Point(0.0, 0.0)
+  >>> p / m  # but this is an exception:
+  Traceback (most recent call last):
+    File "<pyshell#6>", line 1, in <module>
+      p / m
+    File "... /site-packages/fitz/fitz.py", line 869, in __truediv__
+      raise ZeroDivisionError("matrix not invertible")
+  ZeroDivisionError: matrix not invertible
+  >>>
+
+
+As a specialty, rectangles support additional binary operations:
+
+* **intersection** -- the common area of rectangle-likes, operator *"&"*
+* **inclusion** -- enlarge to include a point-like or rect-like, operator *"|"*
+* **containment** check -- whether a point-like or rect-like is inside
+
+Here is an example for creating the smallest rectangle enclosing given points::
+
+  >>> # first define some point-likes
+  >>> points = []
+  >>> for i in range(10):
+          for j in range(10):
+              points.append((i, j))
+  >>>
+  >>> # now create a rectangle containing all these 100 points
+  >>> # start with an empty rectangle
+  >>> r = fitz.Rect(points[0], points[0])
+  >>> for p in points[1:]:  # and include remaining points one by one
+          r |= p
+  >>> r  # here is the to be expected result:
+  Rect(0.0, 0.0, 9.0, 9.0)
+  >>> (4, 5) in r  # this point-like lies inside the rectangle
+  True
+  >>> # and this rect-like is also inside
+  >>> (4, 4, 5, 5) in r
+  True
+  >>>
+
diff --git a/docs/annot.rst b/docs/annot.rst

new file mode 100644 (file)

index 0000000..c7a143a
--- /dev/null
+++ b/docs/annot.rst
@@ -0,0 +1,418 @@
+
+.. _Annot:
+
+================
+Annot
+================
+**This class is supported for PDF documents only.**
+
+Quote from the :ref:`AdobeManual`: "An annotation associates an object such as a note, sound, or movie with a location on a page of a PDF document, or provides a way to interact with the user by means of the mouse and keyboard."
+
+There is a parent-child relationship between an annotation and its page. If the page object becomes unusable (closed document, any document structure change, etc.), then so does every of its existing annotation objects -- an exception is raised saying that the object is "orphaned", whenever an annotation property or method is accessed.
+
+
+=============================== ==============================================================
+**Attribute**                   **Short Description**
+=============================== ==============================================================
+:meth:`Annot.blendMode`         return the annotation's blend mode
+:meth:`Annot.setBlendMode`      set the annotation's blend mode
+:meth:`Annot.delete_responses`  delete all responding annotions
+:meth:`Annot.fileGet`           return attached file content
+:meth:`Annot.fileInfo`          return attached file information
+:meth:`Annot.fileUpd`           set attached file new content
+:meth:`Annot.getPixmap`         image of the annotation as a pixmap
+:meth:`Annot.setBorder`         change the border
+:meth:`Annot.setColors`         change the colors
+:meth:`Annot.setFlags`          change the flags
+:meth:`Annot.setInfo`           change various properties
+:meth:`Annot.setLineEnds`       set line ending styles
+:meth:`Annot.setName`           change the "Name" field (e.g. icon name)
+:meth:`Annot.setOpacity`        change transparency
+:meth:`Annot.setRect`           change the rectangle
+:meth:`Annot.setRotation`       change rotation
+:meth:`Annot.update`            apply accumulated annot changes
+:attr:`Annot.border`            border details
+:attr:`Annot.colors`            border / background and fill colors
+:attr:`Annot.flags`             annotation flags
+:attr:`Annot.info`              various information
+:attr:`Annot.lineEnds`          start / end appearance of line-type annotations
+:attr:`Annot.next`              link to the next annotation
+:attr:`Annot.opacity`           the annot's transparency
+:attr:`Annot.parent`            page object of the annotation
+:attr:`Annot.rect`              rectangle containing the annotation
+:attr:`Annot.type`              type of the annotation
+:attr:`Annot.vertices`          point coordinates of Polygons, PolyLines, etc.
+:attr:`Annot.xref`              the PDF :data:`xref` number
+=============================== ==============================================================
+
+**Class API**
+
+.. class:: Annot
+
+   .. index::
+      pair: matrix; getPixmap
+      pair: colorspace; getPixmap
+      pair: alpha; getPixmap
+
+   .. method:: getPixmap(matrix=fitz.Identity, colorspace=fitz.csRGB, alpha=False)
+
+      Creates a pixmap from the annotation as it appears on the page in untransformed coordinates. The pixmap's :ref:`IRect` equals *Annot.rect.irect* (see below).
+
+      :arg matrix_like matrix: a matrix to be used for image creation. Default is the *fitz.Identity* matrix.
+
+      :arg colorspace: a colorspace to be used for image creation. Default is *fitz.csRGB*.
+      :type colorspace: :ref:`Colorspace`
+
+      :arg bool alpha: whether to include transparency information. Default is *False*.
+
+      :rtype: :ref:`Pixmap`
+
+      .. note:: If the annotation has just been created or modified, you should reload the page first via *page = doc.reload_page(page)*.
+
+   .. method:: setInfo(info=None, content=None, title=None, creationDate=None, modDate=None, subject=None)
+
+      *(Changed in version 1.16.10)*
+
+      Changes annotation properties. These include dates, contents, subject and author (title). Changes for *name* and *id* will be ignored. The update happens selectively: To leave a property unchanged, set it to *None*. To delete existing data, use an empty string.
+
+      :arg dict info: a dictionary compatible with the *info* property (see below). All entries must be strings. If this argument is not a dictionary, the other arguments are used instead -- else they are ignored.
+      :arg str content: *(new in v1.16.10)* see description in :attr:`info`.
+      :arg str title: *(new in v1.16.10)* see description in :attr:`info`.
+      :arg str creationDate: *(new in v1.16.10)* date of annot creation. If given, should be in PDF datetime format.
+      :arg str modDate: *(new in v1.16.10)* date of last modification. If given, should be in PDF datetime format.
+      :arg str subject: *(new in v1.16.10)* see description in :attr:`info`.
+
+   .. method:: setLineEnds(start, end)
+
+      Sets an annotation's line ending styles. Each of these annotation types is defined by a list of points which are connected by lines. The symbol identified by *start* is attached to the first point, and *end* to the last point of this list. For unsupported annotation types, a no-operation with a warning message results.
+
+      .. note::
+
+         * While only 'FreeText', 'Line', 'PolyLine', and 'Polygon' annotations can have these properties, (Py-) MuPDF does not support line ends for 'FreeText', because the call-out variant for these is not supported.
+         * *(Changed in v1.16.16)* Some symbols have an interior area (diamonds, circles, squares, etc.). By default, these areas are filled with the fill color of the annotation. If this is *None*, then white is chosen. The *fill_color* argument of :meth:`Annot.update` can now be used to override this.
+
+      :arg int start: The symbol number for the first point.
+      :arg int end: The symbol number for the last point.
+
+   .. method:: setOpacity(value)
+
+      Set the annotation's transparency. Opacity can also be set in :meth:`Annot.update`.
+
+      :arg float value: a float in range *[0, 1]*. Any value outside is assumed to be 1. E.g. a value of 0.5 sets the transparency to 50%.
+
+      Three overlapping 'Circle' annotations with each opacity set to 0.5:
+
+      .. image:: images/img-opacity.jpg
+
+   .. method:: blendMode()
+
+      *(New in v1.16.14)* Return the annotation's blend mode. See :ref:`AdobeManual`, page 520 for explanations.
+
+      :rtype: str
+      :returns: the blend mode or *None*.
+
+         >>> annot=page.firstAnnot
+         >>> annot.blendMode()
+         'Multiply'
+
+
+   .. method:: setBlendMode(blend_mode)
+
+      *(New in v1.16.14)* Set the annotation's blend mode. See :ref:`AdobeManual`, page 520 for explanations. The blend mode can also be set in :meth:`Annot.update`.
+
+      :arg str blend_mode: set the blend mode. Use :meth:`Annot.update` to reflect this in the visual appearance. For predefined values see :ref:`BlendModes`. The best way to **remove** a special blend mode is choosing ``PDF_BM_Normal``.
+
+         >>> annot.setBlendMode(fitz.PDF_BM_Multiply)
+         >>> annot.update()
+         >>> # or in one statement:
+         >>> annot.update(blend_mode=fitz.PDF_BM_Multiply, ...)
+
+   .. method:: setName(name)
+
+      *(New in version 1.16.0)* Change the name field of any annotation type. For 'FileAttachment' and 'Text' annotations, this is the icon name, for 'Stamp' annotations the text in the stamp. The visual result (if any) depends on your PDF viewer. See also :ref:`mupdficons`.
+
+      :arg str name: the new name.
+
+      .. caution:: If you set the name of a 'Stamp' annotation, then this will **not change** the rectangle, nor will the text be layouted in any way. If you choose a standard text from :ref:`StampIcons` (the **exact** name piece after "STAMP_"), you should receive the original layout. An **arbitrary text** will not be changed to upper case, but be written in font "Times-Bold" as is, horizontally centered in **one line** and be shortened to fit. To get your text fully displayed, its length using fontsize 20 must not exceed 190 pixels. So please make sure that the following inequality is true: ``fitz.getTextlength(text, fontname="tibo", fontsize=20) <= 190``.
+
+   .. method:: setRect(rect)
+
+      Change the rectangle of an annotation. The annotation can be moved around and both sides of the rectangle can be independently scaled. However, the annotation appearance will never get rotated, flipped or sheared.
+
+      :arg rect_like rect: the new rectangle of the annotation (finite and not empty). E.g. using a value of *annot.rect + (5, 5, 5, 5)* will shift the annot position 5 pixels to the right and downwards.
+
+      .. note:: You **need not** invoke :meth:`Annot.update` for activation of the effect.
+
+
+   .. method:: setRotation(angle)
+
+      Set the rotation of an annotation. This rotates the annotation rectangle around its center point. Then a **new annotation rectangle** is calculated from the resulting quad.
+
+      :arg int angle: rotation angle in degrees. Arbitrary values are possible, but will be clamped to the interval 0 <= angle < 360.
+
+      .. note::
+        * You **must invoke** :meth:`Annot.update` to activate the effect.
+        * For PDF_ANNOT_FREE_TEXT, only one of the values 0, 90, 180 and 270 is possible and will **rotate the text** inside the current rectangle (which remains unchanged). Other values are silently ignored and replaced by 0.
+        * Otherwise, only the following :ref:`AnnotationTypes` can be rotated: 'Square', 'Circle', 'Caret', 'Text', 'FileAttachment', 'Ink', 'Line', 'Polyline', 'Polygon', and 'Stamp'. For all others the method is a no-op.
+
+
+   .. method:: setBorder(border=None, width=0, style=None, dashes=None)
+
+      PDF only: Change border width and dashing properties.
+
+      *Changed in version 1.16.9:* Allow specification without using a dictionary. The direct parameters are used if *border* is not a dictionary.
+
+      :arg dict border: a dictionary as returned by the :attr:`border` property, with keys *"width"* (*float*), *"style"* (*str*) and *"dashes"* (*sequence*). Omitted keys will leave the resp. property unchanged. To e.g. remove dashing use: *"dashes": []*. If dashes is not an empty sequence, "style" will automatically be set to "D" (dashed).
+
+      :arg float width: see above.
+      :arg str style: see above.
+      :arg sequence dashes: see above.
+
+   .. method:: setFlags(flags)
+
+      Changes the annotation flags. Use the *|* operator to combine several.
+
+      :arg int flags: an integer specifying the required flags.
+
+   .. method:: setColors(colors=None, stroke=None, fill=None)
+
+      Changes the "stroke" and "fill" colors for supported annotation types.
+
+      *Changed in version 1.16.9:* Allow colors to be directly set. These parameters are used if *colors* is not a dictionary.
+
+      :arg dict colors: a dictionary containing color specifications. For accepted dictionary keys and values see below. The most practical way should be to first make a copy of the *colors* property and then modify this dictionary as required.
+      :arg sequence stroke: see above.
+      :arg sequence fill: see above.
+
+
+   .. method:: delete_responses()
+
+      *(New in version 1.16.12)* Delete annotations referring to this one. This includes any 'Popup' annotations and all annotations responding to it.
+
+
+   .. index::
+      pair: blend_mode; update
+      pair: fontsize; update
+      pair: text_color; update
+      pair: border_color; update
+      pair: fill_color; update
+      pair: cross_out; update
+      pair: rotate; update
+
+   .. method:: update(opacity=None, blend_mode=None, fontsize=0, text_color=None, border_color=None, fill_color=None, cross_out=True, rotate=-1)
+
+      Synchronize the appearance of an annotation with its properties after any changes. 
+
+      You can safely omit this method **only** for the following changes:
+
+         * :meth:`setRect`
+         * :meth:`setFlags`
+         * :meth:`fileUpd`
+         * :meth:`setInfo` (except any changes to *"content"*)
+
+      All arguments are optional. *(Changed in v1.16.14)* Blend mode and opacity are applicable to **all annotation types**. The other arguments are mostly special use, as described below.
+
+      Color specifications may be made in the usual format used in PuMuPDF as sequences of floats ranging from 0.0 to 1.0 (including both). The sequence length must be 1, 3 or 4 (supporting GRAY, RGB and CMYK colorspaces respectively). For mono-color, just a float is also acceptable and yields some shade of gray.
+
+      :arg float opacity: *(new in v1.16.14)* **valid for all annotation types:** change or set the annotation's transparency. Valid values are *0 <= opacity < 1*.
+      :arg str blend_mode: *(new in v1.16.14)* **valid for all annotation types:** change or set the annotation's blend mode. For valid values see :ref:`BlendModes`.
+      :arg float fontsize: change font size of the text. 'FreeText' annotations only.
+      :arg sequence,float text_color: change the text color. 'FreeText' annotations only.
+      :arg sequence,float border_color: change the border color. 'FreeText' annotations only.
+      :arg sequence,float fill_color: the fill color.
+      
+          * 'FreeText' annotations: If you set (or leave) this to *None*, then **no rectangle at all** will be drawn around the text, and the border color will be ignored. This will leave anything "under" the text visible.
+          * 'Line', 'Polyline', 'Polygon' annotations: use it to give applicable line end symbols a fill color other than that of the annotation *(changed in v1.16.16)*.
+
+      :arg bool cross_out: *(new in v1.17.2)* add two diagonal lines to the annotation rectangle. 'Redact' annotations only. If not desired, *False* must be specified even if the annotation was created with *False*.
+      :arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.setRotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable.
+
+      :rtype: bool
+
+
+   .. method:: fileInfo()
+
+      Basic information of the annot's attached file.
+
+      :rtype: dict
+      :returns: a dictionary with keys *filename*, *ufilename*, *desc* (description), *size* (uncompressed file size), *length* (compressed length) for FileAttachment annot types, else *None*.
+
+   .. method:: fileGet()
+
+      Returns attached file content.
+
+      :rtype: bytes
+      :returns: the content of the attached file.
+
+   .. index::
+      pair: buffer; fileUpd
+      pair: filename; fileUpd
+      pair: ufilename; fileUpd
+      pair: desc; fileUpd
+
+   .. method:: fileUpd(buffer=None, filename=None, ufilename=None, desc=None)
+
+      Updates the content of an attached file. All arguments are optional. No arguments lead to a no-op.
+
+      :arg bytes|bytearray|BytesIO buffer: the new file content. Omit to only change meta-information.
+
+         *(Changed in version 1.14.13)* *io.BytesIO* is now also supported.
+
+      :arg str filename: new filename to associate with the file.
+
+      :arg str ufilename: new unicode filename to associate with the file.
+
+      :arg str desc: new description of the file content.
+
+   .. attribute:: opacity
+
+      The annotation's transparency. If set, it is a value in range *[0, 1]*. The PDF default is *1.0*. However, in an effort to tell the difference, we return *-1.0* if not set.
+
+      :rtype: float
+
+   .. attribute:: parent
+
+      The owning page object of the annotation.
+
+      :rtype: :ref:`Page`
+
+   .. attribute:: rotation
+
+      The annot rotation.
+
+      :rtype: int
+      :returns: a value [-1, 359]. If rotation is not at all, -1 is returned (and implies a rotation angle of 0). Other possible values are normalized to some value value 0 <= angle < 360.
+
+   .. attribute:: rect
+
+      The rectangle containing the annotation.
+
+      :rtype: :ref:`Rect`
+
+   .. attribute:: next
+
+      The next annotation on this page or None.
+
+      :rtype: *Annot*
+
+   .. attribute:: type
+
+      A number and one or two strings describing the annotation type, like **[2, 'FreeText', 'FreeTextCallout']**. The second string entry is optional and may be empty. See the appendix :ref:`AnnotationTypes` for a list of possible values and their meanings.
+
+      :rtype: list
+
+   .. attribute:: info
+
+      A dictionary containing various information. All fields are optional strings. If an information is not provided, an empty string is returned.
+
+      * *name* -- e.g. for 'Stamp' annotations it will contain the stamp text like "Sold" or "Experimental", for other annot types you will see the name of the annot's icon here ("PushPin" for FileAttachment).
+
+      * *content* -- a string containing the text for type *Text* and *FreeText* annotations. Commonly used for filling the text field of annotation pop-up windows.
+
+      * *title* -- a string containing the title of the annotation pop-up window. By convention, this is used for the **annotation author**.
+
+      * *creationDate* -- creation timestamp.
+
+      * *modDate* -- last modified timestamp.
+
+      * *subject* -- subject.
+
+      * *id* -- *(new in version 1.16.10)* a unique identification of the annotation. This is taken from PDF key */NM*. Annotations added by PyMuPDF will have a unique name, which appears here.
+
+      :rtype: dict
+
+
+   .. attribute:: flags
+
+      An integer whose low order bits contain flags for how the annotation should be presented.
+
+      :rtype: int
+
+   .. attribute:: lineEnds
+
+      A pair of integers specifying start and end symbol of annotations types 'FreeText', 'Line', 'PolyLine', and 'Polygon'. *None* if not applicable. For possible values and descriptions in this list, see the :ref:`AdobeManual`, table 8.27 on page 630.
+
+      :rtype: tuple
+
+   .. attribute:: vertices
+
+      A list containing a variable number of point ("vertices") coordinates (each given by a pair of floats) for various types of annotations:
+
+      * 'Line' -- the starting and ending coordinates (2 float pairs).
+      * 'FreeText' -- 2 or 3 float pairs designating the starting, the (optional) knee point, and the ending coordinates.
+      * 'PolyLine' / 'Polygon' -- the coordinates of the edges connected by line pieces (n float pairs for n points).
+      * text markup annotations -- 4 float pairs specifying the *QuadPoints* of the marked text span (see :ref:`AdobeManual`, page 634).
+      * 'Ink' -- list of one to many sublists of vertex coordinates. Each such sublist represents a separate line in the drawing.
+
+      :rtype: list
+
+
+   .. attribute:: colors
+
+      dictionary of two lists of floats in range *0 <= float <= 1* specifying the "stroke" and the interior ("fill") colors. The stroke color is used for borders and everything that is actively painted or written ("stroked"). The fill color is used for the interior of objects like line ends, circles and squares. The lengths of these lists implicitely determine the colorspaces used: 1 = GRAY, 3 = RGB, 4 = CMYK. So "[1.0, 0.0, 0.0]" stands for RGB color red. Both lists can be empty if no color is specified.
+
+      :rtype: dict
+
+   .. attribute:: xref
+
+      The PDF :data:`xref`.
+
+      :rtype: int
+
+   .. attribute:: border
+
+      A dictionary containing border characteristics. Empty if no border information exists. The following keys may be present:
+
+      * *width* -- a float indicating the border thickness in points. The value is -1.0 if no width is specified.
+
+      * *dashes* -- a sequence of integers specifying a line dash pattern. *[]* means no dashes, *[n]* means equal on-off lengths of *n* points, longer lists will be interpreted as specifying alternating on-off length values. See the :ref:`AdobeManual` page 217 for more details.
+
+      * *style* -- 1-byte border style: **"S"** (Solid) = solid rectangle surrounding the annotation, **"D"** (Dashed) = dashed rectangle surrounding the annotation, the dash pattern is specified by the *dashes* entry, **"B"** (Beveled) = a simulated embossed rectangle that appears to be raised above the surface of the page, **"I"** (Inset) = a simulated engraved rectangle that appears to be recessed below the surface of the page, **"U"** (Underline) = a single line along the bottom of the annotation rectangle.
+
+      :rtype: dict
+
+
+.. _mupdficons:
+
+Annotation Icons in MuPDF
+-------------------------
+This is a list of icons referencable by name for annotation types 'Text' and 'FileAttachment'. You can use them via the *icon* parameter when adding an annotation, or use the as argument in :meth:`Annot.setName`. It is left to your discretion which item to choose when -- no mechanism will keep you from using e.g. the "Speaker" icon for a 'FileAttachment'.
+
+.. image:: images/mupdf-icons.jpg
+
+
+Example
+--------
+Change the graphical image of an annotation. Also update the "author" and the text to be shown in the popup window::
+
+ doc = fitz.open("circle-in.pdf")
+ page = doc[0]                          # page 0
+ annot = page.firstAnnot                # get the annotation
+ annot.setBorder({"dashes": [3]})       # set dashes to "3 on, 3 off ..."
+
+ # set stroke and fill color to some blue
+ annot.setColors({"stroke":(0, 0, 1), "fill":(0.75, 0.8, 0.95)})
+ info = annot.info                      # get info dict
+ info["title"] = "Jorj X. McKie"        # set author
+
+ # text in popup window ...
+ info["content"] = "I changed border and colors and enlarged the image by 20%."
+ info["subject"] = "Demonstration of PyMuPDF"     # some PDF viewers also show this
+ annot.setInfo(info)                    # update info dict
+ r = annot.rect                         # take annot rect
+ r.x1 = r.x0 + r.width  * 1.2           # new location has same top-left
+ r.y1 = r.y0 + r.height * 1.2           # but 20% longer sides
+ annot.setRect(r)                       # update rectangle
+ annot.update()                         # update the annot's appearance
+ doc.save("circle-out.pdf")             # save
+
+This is how the circle annotation looks like before and after the change (pop-up windows displayed using Nitro PDF viewer):
+
+|circle|
+
+.. |circle| image:: images/img-circle.png
+
+
+.. rubric:: Footnotes
+
+.. [#f1] Rotating an annotation generally also changes its rectangle. Depending on how the annotation was defined, the original rectangle in general is **not reconstructible** by setting the rotation value to zero. This information may be lost. 
diff --git a/docs/app1.rst b/docs/app1.rst

new file mode 100644 (file)

index 0000000..4e54cda
--- /dev/null
+++ b/docs/app1.rst
@@ -0,0 +1,162 @@
+===============================
+Appendix 1: Performance
+===============================
+
+We have tried to get an impression on PyMuPDF's performance. While we know this is very hard and a fair comparison is almost impossible, we feel that we at least should provide some quantitative information to justify our bold comments on MuPDF's **top performance**.
+
+Following are three sections that deal with different aspects of performance:
+
+* document parsing
+* text extraction
+* image rendering
+
+In each section, the same fixed set of PDF files is being processed by a set of tools. The set of tools varies -- for reasons we will explain in the section.
+
+.. |fsizes| image:: images/img-filesizes.png
+
+Here is the list of files we are using. Each file name is accompanied by further information: **size** in bytes, number of **pages**, number of bookmarks (**toc** entries), number of **links**, **text** size as a percentage of file size, **KB** per page, PDF **version** and remarks. **text %** and **KB index** are indicators for whether a file is text or graphics oriented.
+|fsizes|
+E.g. *Adobe.pdf* and *PyMuPDF.pdf* are clearly text oriented, all other files contain many more images.
+
+
+
+Part 1: Parsing
+~~~~~~~~~~~~~~~~
+
+How fast is a PDF file read and its content parsed for further processing? The sheer parsing performance cannot directly be compared, because batch utilities always execute a requested task completely, in one go, front to end. *pdfrw* too, has a *lazy* strategy for parsing, meaning it only parses those parts of a document that are required in any moment.
+
+To yet find an answer to the question, we therefore measure the time to copy a PDF file to an output file with each tool, and doing nothing else.
+
+**These were the tools**
+
+All tools are either platform independent, or at least can run both, on Windows and Unix / Linux (pdftk).
+
+**Poppler** is missing here, because it specifically is a Linux tool set, although we know there exist Windows ports (created with considerable effort apparently). Technically, it is a C/C++ library, for which a Python binding exists -- in so far somewhat comparable to PyMuPDF. But Poppler in contrast is tightly coupled to **Qt** and **Cairo**. We may still include it in future, when a more handy Windows installation is available. We have seen however some `analysis  <http://hzqtc.github.io/2012/04/poppler-vs-mupdf.html>`_, that hints at a much lower performance than MuPDF. Our comparison of text extraction speeds also show a much lower performance of Poppler's PDF code base **Xpdf**.
+
+Image rendering of MuPDF also is about three times faster than the one of Xpdf when comparing the command line tools *mudraw* of MuPDF and *pdftopng* of Xpdf -- see part 3 of this chapter.
+
+========= ==========================================================================
+Tool      Description
+========= ==========================================================================
+PyMuPDF   tool of this manual, appearing as "fitz" in reports
+pdfrw     a pure Python tool, is being used by rst2pdf, has interface to ReportLab
+PyPDF2    a pure Python tool with a very complete function set
+pdftk     a command line utility with numerous functions
+========= ==========================================================================
+
+This is how each of the tools was used:
+
+**PyMuPDF**:
+::
+ doc = fitz.open("input.pdf")
+ doc.save("output.pdf")
+
+**pdfrw**:
+::
+ doc = PdfReader("input.pdf")
+ writer = PdfWriter()
+ writer.trailer = doc
+ writer.write("output.pdf")
+
+**PyPDF2**:
+::
+ pdfmerge = PyPDF2.PdfFileMerger()
+ pdfmerge.append("input.pdf")
+ pdfmerge.write("output.pdf")
+ pdfmerge.close()
+
+**pdftk**:
+::
+ pdftk input.pdf output output.pdf
+
+
+**Observations**
+
+.. |cpyspeed1| image:: images/img-copy-speed-1.png
+.. |cpyspeed2| image:: images/img-copy-speed-2.png
+
+These are our run time findings (in **seconds**, please note the European number convention: meaning of decimal point and comma is reversed):
+
+|cpyspeed1|
+
+If we leave out the Adobe manual, this table looks like
+
+|cpyspeed2|
+
+PyMuPDF is by far the fastest: on average 4.5 times faster than the second best (the pure Python tool pdfrw, **chapeau pdfrw!**), and almost 20 times faster than the command line tool pdftk.
+
+Where PyMuPDF only requires less than 13 seconds to process all files, pdftk affords itself almost 4 minutes.
+
+By far the slowest tool is PyPDF2 -- it is more than 66 times slower than PyMuPDF and 15 times slower than pdfrw! The main reason for PyPDF2's bad look comes from the Adobe manual. It obviously is slowed down by the linear file structure and the immense amount of bookmarks of this file. If we take out this special case, then PyPDF2 is only 21.5 times slower than PyMuPDF, 4.5 times slower than pdfrw and 1.2 times slower than pdftk.
+
+If we look at the output PDFs, there is one surprise:
+
+Each tool created a PDF of similar size as the original. Apart from the Adobe case, PyMuPDF always created the smallest output.
+
+Adobe's manual is an exception: The pure Python tools pdfrw and PyPDF2 **reduced** its size by more than 20% (and yielded a document which is no longer linearized)!
+
+PyMuPDF and pdftk in contrast **drastically increased** the size by 40% to about 50 MB (also no longer linearized).
+
+So far, we have no explanation of what is happening here.
+
+
+Part 2: Text Extraction
+~~~~~~~~~~~~~~~~~~~~~~~~
+We also have compared text extraction speed with other tools.
+
+The following table shows a run time comparison. PyMuPDF's methods appear as "fitz (TEXT)" and "fitz (JSON)" respectively. The tool *pdftotext.exe* of the `Xpdf <http://www.foolabs.com/xpdf/>`_ toolset appears as "xpdf".
+
+* **extractText():** basic text extraction without layout re-arrangement (using *GetText(..., output = "text")*)
+* **pdftotext:** a command line tool of the **Xpdf** toolset (which also is the basis of `Poppler's library <http://poppler.freedesktop.org/>`_)
+* **extractJSON():** text extraction with layout information (using *GetText(..., output = "json")*)
+* **pdfminer:** a pure Python PDF tool specialized on text extraction tasks
+
+All tools have been used with their most basic, fanciless functionality -- no layout re-arrangements, etc.
+
+For demonstration purposes, we have included a version of *GetText(doc, output = "json")*, that also re-arranges the output according to occurrence on the page.
+
+.. |textperf| image:: images/img-textperformance.png
+
+Here are the results using the same test files as above (again: decimal point and comma reversed):
+
+|textperf|
+
+Again, (Py-) MuPDF is the fastest around. It is 2.3 to 2.6 times faster than xpdf.
+
+*pdfminer*, as a pure Python solution, of course is comparatively slow: MuPDF is 50 to 60 times faster and xpdf is 23 times faster. These observations in order of magnitude coincide with the statements on this `web site <http://www.unixuser.org/~euske/python/pdfminer/>`_.
+
+Part 3: Image Rendering
+~~~~~~~~~~~~~~~~~~~~~~~~
+We have tested rendering speed of MuPDF against the *pdftopng.exe*, a command lind tool of the **Xpdf** toolset (the PDF code basis of **Poppler**).
+
+**MuPDF invocation using a resolution of 150 pixels (Xpdf default):**
+::
+ mutool draw -o t%d.png -r 150 file.pdf
+
+**PyMuPDF invocation:**
+::
+ zoom = 150.0 / 72.0
+ mat = fitz.Matrix(zoom, zoom)
+ def ProcessFile(datei):
+     print "processing:", datei
+     doc=fitz.open(datei)
+     for p in fitz.Pages(doc):
+         pix = p.getPixmap(matrix=mat, alpha = False)
+         pix.writePNG("t-%s.png" % p.number)
+         pix = None
+     doc.close()
+     return
+
+**Xpdf invocation:**
+::
+ pdftopng.exe file.pdf ./
+
+.. |renderspeed| image:: images/img-render-speed.png
+
+The resulting runtimes can be found here (again: meaning of decimal point and comma reversed):
+
+|renderspeed|
+
+* MuPDF and PyMuPDF are both about 3 times faster than Xpdf.
+
+* The 2% speed difference between MuPDF (a utility written in C) and PyMuPDF is the Python overhead.
diff --git a/docs/app2.rst b/docs/app2.rst

new file mode 100644 (file)

index 0000000..760810b
--- /dev/null
+++ b/docs/app2.rst
@@ -0,0 +1,321 @@
+.. _Appendix2:
+
+======================================
+Appendix 2: Details on Text Extraction
+======================================
+This chapter provides background on the text extraction methods of PyMuPDF.
+
+Information of interest are
+
+* what do they provide?
+* what do they imply (processing time / data sizes)?
+
+General structure of a TextPage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+:ref:`TextPage` is one of PyMuPDF's classes. It is normally created behind the curtain, when :ref:`Page` text extraction methods are used, but it is also available directly. In any case, an intermediate class, :ref:`DisplayList` must be created first (display lists contain interpreted pages, they also provide the input for :ref:`Pixmap` creation). Information contained in a :ref:`TextPage` has the following hierarchy. Other than its name suggests, images may optionally also be part of a text page::
+
+ <page>
+     <text block>
+         <line>
+             <span>
+                 <char>
+     <image block>
+         <img>
+
+A **text page** consists of blocks (= roughly paragraphs).
+
+A **block** consists of either lines and their characters, or an image.
+
+A **line** consists of spans.
+
+A **span** consists of adjacent characters with identical font properties: name, size, flags and color.
+
+Plain Text
+~~~~~~~~~~
+
+Function :meth:`TextPage.extractText` (or *Page.getText("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order).
+
+An example output::
+
+    >>> print(page.getText("text"))
+    Some text on first page.
+
+
+BLOCKS
+~~~~~~~~~~
+
+Function :meth:`TextPage.extractBLOCKS` (or *Page.getText("blocks")*) extracts a page's text blocks as a list of items like::
+
+    (x0, y0, x1, y1, "lines in block", block_type, block_no)
+
+Where the first 4 items are the float coordinates of the block's bbox. The lines within each block are concatenated by a new-line character.
+
+This is a high-speed method with enough information to re-arrange the page's text in natural reading order where required.
+
+Example output::
+
+    >>> print(page.getText("blocks"))
+    [(50.0, 88.17500305175781, 166.1709747314453, 103.28900146484375,
+    'Some text on first page.', 0, 0)]
+
+
+WORDS
+~~~~~~~~~~
+
+Function :meth:`TextPage.extractWORDS` (or *Page.getText("words")*) extracts a page's text **words** as a list of items like::
+
+    (x0, y0, x1, y1, "word", block_no, line_no, word_no)
+
+Where the first 4 items are the float coordinates of the words's bbox. The last three integers provide some more information on the word's whereabouts.
+
+This is a high-speed method with enough information to extract text contained in a given rectangle.
+
+Example output::
+
+    >>> for word in page.getText("words"):
+            print(word)
+    (50.0, 88.17500305175781, 78.73200225830078, 103.28900146484375,
+    'Some', 0, 0, 0)
+    (81.79000091552734, 88.17500305175781, 99.5219955444336, 103.28900146484375,
+    'text', 0, 0, 1)
+    (102.57999420166016, 88.17500305175781, 114.8119888305664, 103.28900146484375,
+    'on', 0, 0, 2)
+    (117.86998748779297, 88.17500305175781, 135.5909881591797, 103.28900146484375,
+    'first', 0, 0, 3)
+    (138.64898681640625, 88.17500305175781, 166.1709747314453, 103.28900146484375,
+    'page.', 0, 0, 4)
+
+HTML
+~~~~
+
+:meth:`TextPage.extractHTML` (or *Page.getText("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internate browser. Our above example::
+
+    >>> for line in page.getText("html").splitlines():
+            print(line)
+
+    <div id="page0" style="position:relative;width:300pt;height:350pt;
+    background-color:white">
+    <p style="position:absolute;white-space:pre;margin:0;padding:0;top:88pt;
+    left:50pt"><span style="font-family:Helvetica,sans-serif;
+    font-size:11pt">Some text on first page.</span></p>
+    </div>
+
+
+.. _HTMLQuality:
+
+Controlling Quality of HTML Output
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+While HTML output has improved a lot in MuPDF v1.12.0, it is not yet bug-free: we have found problems in the areas **font support** and **image positioning**.
+
+* HTML text contains references to the fonts used of the original document. If these are not known to the browser (a fat chance!), it will replace them with his assumptions, which probably will let the result look awkward. This issue varies greatly by browser -- on my Windows machine, MS Edge worked just fine, whereas Firefox looked horrible.
+
+* For PDFs with a complex structure, images may not be positioned and / or sized correctly. This seems to be the case for rotated pages and pages, where the various possible page bbox variants do not coincide (e.g. *MediaBox != CropBox*). We do not know yet, how to address this -- we filed a bug at MuPDF's site.
+
+To address the font issue, you can use a simple utility script to scan through the HTML file and replace font references. Here is a little example that replaces all fonts with one of the :ref:`Base-14-Fonts`: serifed fonts will become "Times", non-serifed "Helvetica" and monospaced will become "Courier". Their respective variations for "bold", "italic", etc. are hopefully done correctly by your browser::
+
+ import sys
+ filename = sys.argv[1]
+ otext = open(filename).read()                 # original html text string
+ pos1 = 0                                      # search start poition
+ font_serif = "font-family:Times"              # enter ...
+ font_sans  = "font-family:Helvetica"          # ... your choices ...
+ font_mono  = "font-family:Courier"            # ... here
+ found_one  = False                            # true if search successfull
+
+ while True:
+     pos0 = otext.find("font-family:", pos1)   # start of a font spec
+     if pos0 < 0:                              # none found - we are done
+         break
+     pos1 = otext.find(";", pos0)              # end of font spec
+     test = otext[pos0 : pos1]                 # complete font spec string
+     testn = ""                                # the new font spec string
+     if test.endswith(",serif"):               # font with serifs?
+         testn = font_serif                    # use Times instead
+     elif test.endswith(",sans-serif"):        # sans serifs font?
+         testn = font_sans                     # use Helvetica
+     elif test.endswith(",monospace"):         # monospaced font?
+         testn = font_mono                     # becomes Courier
+ 
+     if testn != "":                           # any of the above found?
+         otext = otext.replace(test, testn)    # change the source
+         found_one = True
+         pos1 = 0                              # start over
+ 
+ if found_one:
+     ofile = open(filename + ".html", "w")
+     ofile.write(otext)
+     ofile.close()
+ else:
+     print("Warning: could not find any font specs!")
+
+
+
+DICT (or JSON)
+~~~~~~~~~~~~~~~~
+
+:meth:`TextPage.extractDICT` (or *Page.getText("dict")*) output fully reflects the structure of a *TextPage* and provides image content and position details (*bbox* -- boundary boxes in pixel units) for every block and line. This information can be used to present text in another reading order if required (e.g. from top-left to bottom-right). Images are stored as *bytes* (*bytearray* in Python 2) for DICT output and base64 encoded strings for JSON output.
+
+For a visuallization of the dictionary structure have a look at :ref:`textpagedict`.
+
+Here is how this looks like::
+
+    {
+        "width": 300.0,
+        "height": 350.0,
+        "blocks": [{
+            "type": 0,
+            "bbox": [50.0, 88.17500305175781, 166.1709747314453, 103.28900146484375],
+            "lines": [{
+                "wmode": 0,
+                "dir": [1.0, 0.0],
+                "bbox": [50.0, 88.17500305175781, 166.1709747314453, 103.28900146484375],
+                "spans": [{
+                    "size": 11.0,
+                    "flags": 0,
+                    "font": "Helvetica",
+                    "color": 0,
+                    "text": "Some text on first page.",
+                    "bbox": [50.0, 88.17500305175781, 166.1709747314453, 103.28900146484375]
+                }]
+            }]
+        }]
+    }
+
+RAWDICT
+~~~~~~~~~~~~~~~~
+:meth:`TextPage.extractRAWDICT` (or *Page.getText("rawdict")*) is an **information superset of DICT** and takes the detail level one step deeper. It looks exactly like the above, except that the *"text"* items (*string*) are replaced by *"chars"* items (*list*). Each *"chars"* entry is a character *dict*. For example, here is what you would see in place of item *"text": "Text in black color."* above::
+
+    "chars": [{
+        "origin": [50.0, 100.0],
+        "bbox": [50.0, 88.17500305175781, 57.336997985839844, 103.28900146484375],
+        "c": "S"
+    }, {
+        "origin": [57.33700180053711, 100.0],
+        "bbox": [57.33700180053711, 88.17500305175781, 63.4530029296875, 103.28900146484375],
+        "c": "o"
+    }, {
+        "origin": [63.4530029296875, 100.0],
+        "bbox": [63.4530029296875, 88.17500305175781, 72.61600494384766, 103.28900146484375],
+        "c": "m"
+    }, {
+        "origin": [72.61600494384766, 100.0],
+        "bbox": [72.61600494384766, 88.17500305175781, 78.73200225830078, 103.28900146484375],
+        "c": "e"
+    }, {
+        "origin": [78.73200225830078, 100.0],
+        "bbox": [78.73200225830078, 88.17500305175781, 81.79000091552734, 103.28900146484375],
+        "c": " "
+    < ... deleted ... >
+    }, {
+        "origin": [163.11297607421875, 100.0],
+        "bbox": [163.11297607421875, 88.17500305175781, 166.1709747314453, 103.28900146484375],
+        "c": "."
+    }],
+
+
+XML
+~~~
+
+The :meth:`TextPage.extractXML` (or *Page.getText("xml")*) version extracts text (no images) with the detail level of RAWDICT::
+  
+    >>> for line in page.getText("xml").splitlines():
+        print(line)
+
+    <page id="page0" width="300" height="350">
+    <block bbox="50 88.175 166.17098 103.289">
+    <line bbox="50 88.175 166.17098 103.289" wmode="0" dir="1 0">
+    <font name="Helvetica" size="11">
+    <char quad="50 88.175 57.336999 88.175 50 103.289 57.336999 103.289" x="50"
+    y="100" color="#000000" c="S"/>
+    <char quad="57.337 88.175 63.453004 88.175 57.337 103.289 63.453004 103.289" x="57.337"
+    y="100" color="#000000" c="o"/>
+    <char quad="63.453004 88.175 72.616008 88.175 63.453004 103.289 72.616008 103.289" x="63.453004"
+    y="100" color="#000000" c="m"/>
+    <char quad="72.616008 88.175 78.732 88.175 72.616008 103.289 78.732 103.289" x="72.616008"
+    y="100" color="#000000" c="e"/>
+    <char quad="78.732 88.175 81.79 88.175 78.732 103.289 81.79 103.289" x="78.732"
+    y="100" color="#000000" c=" "/>
+
+    ... deleted ...
+
+    <char quad="163.11298 88.175 166.17098 88.175 163.11298 103.289 166.17098 103.289" x="163.11298"
+    y="100" color="#000000" c="."/>
+    </font>
+    </line>
+    </block>
+    </page>
+
+.. note:: We have successfully tested `lxml <https://pypi.org/project/lxml/>`_ to interpret this output.
+
+XHTML
+~~~~~
+:meth:`TextPage.extractXHTML` (or *Page.getText("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output)::
+
+    <div id="page0">
+    <p>Some text on first page.</p>
+    </div>
+
+.. _text_extraction_flags:
+
+Text Extraction Flags Defaults
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*(New in version 1.16.2)* Method :meth:`Page.getText` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.
+
+=================== ==== ==== ===== === ==== ======= ===== ======
+Indicator           text html xhtml xml dict rawdict words blocks
+=================== ==== ==== ===== === ==== ======= ===== ======
+preserve ligatures  1    1    1     1   1    1       1     1
+preserve whitespace 1    1    1     1   1    1       1     1
+preserve images     n/a  1    1     n/a 1    1       n/a   0
+inhibit spaces      0    0    0     0   0    0       0     0
+=================== ==== ==== ===== === ==== ======= ===== ======
+
+* **"json"** is handled exactly like **"dict"** and is hence left out.
+* An "n/a" specification means a value of 0 and setting this bit never has any effect on the output (but an adverse effect on performance).
+* If you are not interested in images when using an output variant which includes them by default, then by all means set the respective bit off: You will experience a better performance and much lower space requirements.
+
+To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::
+
+    >>> print(page.getText("text"))
+    H a l l o !
+    Mo r e  t e x t
+    i s  f o l l o w i n g
+    i n  E n g l i s h
+    . . .  l e t ' s  s e e
+    w h a t  h a p p e n s .
+    >>> print(page.getText("text", flags=fitz.TEXT_INHIBIT_SPACES))
+    Hallo!
+    More text
+    is following
+    in English
+    ... let's see
+    what happens.
+    >>> 
+
+
+Performance
+~~~~~~~~~~~~
+The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means that more processing is required and a higher data volume is generated.
+
+.. note:: Especially images have a **very significant** impact. Make sure to exclude them (via the *flags* parameter) whenever you do not need them. To process the below mentioned 2'700 total pages with default flags settings required 160 seconds across all extraction methods. When all images where excluded, less than 50% of that time (77 seconds) were needed.
+
+To begin with, all methods are **very fast** in relation to other products out there in the market. In terms of processing speed, we are not aware of a faster (free) tool. Even the most detailed method, RAWDICT, processes all 1'310 pages of the :ref:`AdobeManual` in less than 5 seconds (simple text needs less than 2 seconds here).
+
+The following table shows average relative speeds ("RSpeed", baseline 1.00 is TEXT), taken across ca. 1400 text-heavy and 1300 image-heavy pages.
+
+======= ====== ===================================================================== ==========
+Method  RSpeed Comments                                                               no images
+======= ====== ===================================================================== ==========
+TEXT     1.00  no images, **plain** text, line breaks                                 1.00
+BLOCKS   1.00  image bboxes (only), **block** level text with bboxes, line breaks     1.00
+WORDS    1.02  no images, **word** level text with bboxes                             1.02
+XML      2.72  no images, **char** level text, layout and font details                2.72
+XHTML    3.32  **base64** images, **span** level text, no layout info                 1.00
+HTML     3.54  **base64** images, **span** level text, layout and font details        1.01
+DICT     3.93  **binary** images, **span** level text, layout and font details        1.04
+RAWDICT  4.50  **binary** images, **char** level text, layout and font details        1.68
+======= ====== ===================================================================== ==========
+
+As mentioned: when excluding all images (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
+
+Look at chapter **Appendix 1** for more performance information.
diff --git a/docs/app3.rst b/docs/app3.rst

new file mode 100644 (file)

index 0000000..4740cd8
--- /dev/null
+++ b/docs/app3.rst
@@ -0,0 +1,32 @@
+.. _Appendix 3:
+
+================================================
+Appendix 3: Considerations on Embedded Files
+================================================
+This chapter provides some background on embedded files support in PyMuPDF.
+
+General
+----------
+Starting with version 1.4, PDF supports embedding arbitrary files as part ("Embedded File Streams") of a PDF document file (see chapter 3.10.3, pp. 184 of the :ref:`AdobeManual`).
+
+In many aspects, this is comparable to concepts also found in ZIP files or the OLE technique in MS Windows. PDF embedded files do, however, *not* support directory structures as does the ZIP format. An embedded file can in turn contain embedded files itself.
+
+Advantages of this concept are that embedded files are under the PDF umbrella, benefitting from its permissions / password protection and integrity aspects: all data, which a PDF may reference or even may be dependent on, can be bundled into it and so form a single, consistent unit of information.
+
+In addition to embedded files, PDF 1.7 adds *collections* to its support range. This is an advanced way of storing and presenting meta information (i.e. arbitrary and extensible properties) of embedded files.
+
+MuPDF Support
+--------------
+After adding initial support for collections (portfolios) and */EmbeddedFiles* in MuPDF version 1.11, this support was dropped again in version 1.15.
+
+As a consequence, the cli utility *mutool* no longer offers access to embedded files.
+
+PyMuPDF -- having implemented an */EmbeddedFiles* API in response in its version 1.11.0 -- was therefore forced to change gears starting with its version 1.16.0 (we never published a MuPDF v1.15.x compatible PyMuPDF).
+
+We are now maintaining our own code basis supporting embedded files. This code makes use of basic MuPDF dictionary and array functions only.
+
+PyMuPDF Support
+------------------
+We continue to support the full old API with respect to embedded files -- with only minor, cosmetic changes.
+
+There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embeddedFileNames`.
diff --git a/docs/app4.rst b/docs/app4.rst

new file mode 100644 (file)

index 0000000..4e1a3f2
--- /dev/null
+++ b/docs/app4.rst
@@ -0,0 +1,241 @@
+.. _Appendix 4:
+
+================================================
+Appendix 4: Assorted Technical Information
+================================================
+
+.. _Base-14-Fonts:
+
+PDF Base 14 Fonts
+---------------------
+The following 14 builtin font names **must be supported by every PDF viewer** application. They are available as a dictionary, which maps their full names amd their abbreviations in lower case to the full font basename. Whereever a **fontname** must be provided in PyMuPDF, any **key or value** from the dictionary may be used::
+
+    In [2]: fitz.Base14_fontdict
+    Out[2]:
+    {'courier': 'Courier',
+    'courier-oblique': 'Courier-Oblique',
+    'courier-bold': 'Courier-Bold',
+    'courier-boldoblique': 'Courier-BoldOblique',
+    'helvetica': 'Helvetica',
+    'helvetica-oblique': 'Helvetica-Oblique',
+    'helvetica-bold': 'Helvetica-Bold',
+    'helvetica-boldoblique': 'Helvetica-BoldOblique',
+    'times-roman': 'Times-Roman',
+    'times-italic': 'Times-Italic',
+    'times-bold': 'Times-Bold',
+    'times-bolditalic': 'Times-BoldItalic',
+    'symbol': 'Symbol',
+    'zapfdingbats': 'ZapfDingbats',
+    'helv': 'Helvetica',
+    'heit': 'Helvetica-Oblique',
+    'hebo': 'Helvetica-Bold',
+    'hebi': 'Helvetica-BoldOblique',
+    'cour': 'Courier',
+    'coit': 'Courier-Oblique',
+    'cobo': 'Courier-Bold',
+    'cobi': 'Courier-BoldOblique',
+    'tiro': 'Times-Roman',
+    'tibo': 'Times-Bold',
+    'tiit': 'Times-Italic',
+    'tibi': 'Times-BoldItalic',
+    'symb': 'Symbol',
+    'zadb': 'ZapfDingbats'}
+
+In contrast to their obligation, not all PDF viewers support these fonts correctly and completely -- this is especially true for Symbol and ZapfDingbats. Also, the glyph (visual) images will be specific to every reader.
+
+To see how these fonts can be used -- including the **CJK built-in** fonts -- look at the table in :meth:`Page.insertFont`.
+
+------------
+
+.. _AdobeManual:
+
+Adobe PDF References
+---------------------------
+
+This PDF Reference manual published by Adobe is frequently quoted throughout this documentation. It can be viewed and downloaded from `here <http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf>`_.
+
+There is a newer version of this, which can be found `here <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_. Redaction annotations are an example contained in this one, but not in the earlier version.
+
+------------
+
+.. _SequenceTypes:
+
+Using Python Sequences as Arguments in PyMuPDF
+------------------------------------------------
+When PyMuPDF objects and methods require a Python **list** of numerical values, other Python **sequence types** are also allowed. Python classes are said to implement the **sequence protocol**, if they have a *__getitem__()* method.
+
+This basically means, you can interchangeably use Python *list* or *tuple* or even *array.array*, *numpy.array* and *bytearray* types in these cases.
+
+For example, specifying a sequence *"s"* in any of the following ways
+
+* *s = [1, 2]*
+* *s = (1, 2)*
+* *s = array.array("i", (1, 2))*
+* *s = numpy.array((1, 2))*
+* *s = bytearray((1, 2))*
+
+will make it usable in the following example expressions:
+
+* *fitz.Point(s)*
+* *fitz.Point(x, y) + s*
+* *doc.select(s)*
+
+Similarly with all geometry objects :ref:`Rect`, :ref:`IRect`, :ref:`Matrix` and :ref:`Point`.
+
+Because all PyMuPDF geometry classes themselves are special cases of sequences, they (with the exception of :ref:`Quad` -- see below) can be freely used where numerical sequences can be used, e.g. as arguments for functions like *list()*, *tuple()*, *array.array()* or *numpy.array()*. Look at the following snippet to see this work.
+
+>>> import fitz, array, numpy as np
+>>> m = fitz.Matrix(1, 2, 3, 4, 5, 6)
+>>>
+>>> list(m)
+[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
+>>>
+>>> tuple(m)
+(1.0, 2.0, 3.0, 4.0, 5.0, 6.0)
+>>>
+>>> array.array("f", m)
+array('f', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
+>>>
+>>> np.array(m)
+array([1., 2., 3., 4., 5., 6.])
+
+.. note:: :ref:`Quad` is a Python sequence object as well and has a length of 4. Its items however are :data:`point_like` -- not numbers. Therefore, the above remarks do not apply.
+
+------------
+
+.. _ReferenialIntegrity:
+
+Ensuring Consistency of Important Objects in PyMuPDF
+------------------------------------------------------------
+PyMuPDF is a Python binding for the C library MuPDF. While a lot of effort has been invested by MuPDF's creators to approximate some sort of an object-oriented behavior, they certainly could not overcome basic shortcomings of the C language in that respect.
+
+Python on the other hand implements the OO-model in a very clean way. The interface code between PyMuPDF and MuPDF consists of two basic files: *fitz.py* and *fitz_wrap.c*. They are created by the excellent SWIG tool for each new version.
+
+When you use one of PyMuPDF's objects or methods, this will result in excution of some code in *fitz.py*, which in turn will call some C code compiled with *fitz_wrap.c*.
+
+Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *deletePage()*, *insertPage()* ... and more.
+
+But just no longer accessing invalidated objects is actually not enough: They should rather be actively deleted entirely, to also free C-level resources (meaning allocated memory).
+
+The reason for these rules lies in the fact that there is a hierachical 2-level one-to-many relationship between a document and its pages and also between a page and its links / annotations. To maintain a consistent situation, any of the above actions must lead to a complete reset -- in **Python and, synchronously, in C**.
+
+SWIG cannot know about this and consequently does not do it.
+
+The required logic has therefore been built into PyMuPDF itself in the following way.
+
+1. If a page "loses" its owning document or is being deleted itself, all of its currently existing annotations and links will be made unusable in Python, and their C-level counterparts will be deleted and deallocated.
+
+2. If a document is closed (or deleted or set to *None*) or if its structure has changed, then similarly all currently existing pages and their children will be made unusable, and corresponding C-level deletions will take place. "Structure changes" include methods like *select()*, *delePage()*, *insertPage()*, *insertPDF()* and so on: all of these will result in a cascade of object deletions.
+
+The programmer will normally not realize any of this. If he, however, tries to access invalidated objects, exceptions will be raised.
+
+Invalidated objects cannot be directly deleted as with Python statements like *del page* or *page = None*, etc. Instead, their *__del__* method must be invoked.
+
+All pages, links and annotations have the property *parent*, which points to the owning object. This is the property that can be checked on the application level: if *obj.parent == None* then the object's parent is gone, and any reference to its properties or methods will raise an exception informing about this "orphaned" state.
+
+A sample session:
+
+>>> page = doc[n]
+>>> annot = page.firstAnnot
+>>> annot.type                    # everything works fine
+[5, 'Circle']
+>>> page = None                   # this turns 'annot' into an orphan
+>>> annot.type
+<... omitted lines ...>
+RuntimeError: orphaned object: parent is None
+>>>
+>>> # same happens, if you do this:
+>>> annot = doc[n].firstAnnot     # deletes the page again immediately!
+>>> annot.type                    # so, 'annot' is 'born' orphaned
+<... omitted lines ...>
+RuntimeError: orphaned object: parent is None
+
+This shows the cascading effect:
+
+>>> doc = fitz.open("some.pdf")
+>>> page = doc[n]
+>>> annot = page.firstAnnot
+>>> page.rect
+fitz.Rect(0.0, 0.0, 595.0, 842.0)
+>>> annot.type
+[5, 'Circle']
+>>> del doc                       # or doc = None or doc.close()
+>>> page.rect
+<... omitted lines ...>
+RuntimeError: orphaned object: parent is None
+>>> annot.type
+<... omitted lines ...>
+RuntimeError: orphaned object: parent is None
+
+.. note:: Objects outside the above relationship are not included in this mechanism. If you e.g. created a table of contents by *toc = doc.getToC()*, and later close or change the document, then this cannot and does not change variable *toc* in any way. It is your responsibility to refresh such variables as required.
+
+------------
+
+.. _FormXObject:
+
+Design of Method :meth:`Page.showPDFpage`
+--------------------------------------------
+
+Purpose and Capabilities
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The method displays an image of a ("source") page of another PDF document within a specified rectangle of the current ("containing", "target") page.
+
+* **In contrast** to :meth:`Page.insertImage`, this display is vector-based and hence remains accurate across zooming levels.
+* **Just like** :meth:`Page.insertImage`, the size of the display is adjusted to the given rectangle.
+
+The following variations of the display are currently supported:
+
+* Bool parameter *keep_proportion* controls whether to maintain the aspect ratio (default) or not.
+* Rectangle parameter *clip* restricts the visible part of the source page rectangle. Default is the full page.
+* float *rotation* rotates the display by an arbitrary angle (degrees). If the angle is not an integer multiple of 90, only 2 of the 4 corners may be positioned on the target border if also *keep_proportion* is true.
+* Bool parameter *overlay* controls whether to put the image on top (foreground, default) of current page content or not (background).
+
+Use cases include (but are not limited to) the following:
+
+1. "Stamp" a series of pages of the current document with the same image, like a company logo or a watermark.
+2. Combine arbitrary input pages into one output page to support “booklet” or double-sided printing (known as "4-up", "n-up").
+3. Split up (large) input pages into several arbitrary pieces. This is also called “posterization”, because you e.g. can split an A4 page horizontally and vertically, print the 4 pieces enlarged to separate A4 pages, and end up with an A2 version of your original page.
+
+Technical Implementation
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This is done using PDF **"Form XObjects"**, see section 4.9 on page 355 of :ref:`AdobeManual`. On execution of a *Page.showPDFpage(rect, src, pno, ...)*, the following things happen:
+
+    1. The :data:`resources` and :data:`contents` objects of page *pno* in document *src* are copied over to the current document, jointly creating a new **Form XObject** with the following properties. The PDF :data:`xref` number of this object is returned by the method.
+
+        a. */BBox* equals */Mediabox* of the source page
+        b. */Matrix* equals the identity matrix *[1 0 0 1 0 0]*
+        c. */Resources* equals that of the source page. This involves a “deep-copy” of hierarchically nested other objects (including fonts, images, etc.). The complexity involved here is covered by MuPDF’s grafting [#f1]_ technique functions.
+        d. This is a stream object type, and its stream is an exact copy of the combined data of the source page's */Contents* objects.
+
+        This step is only executed once per shown source page. Subsequent displays of the same page only create pointers (done in next step) to this object.
+
+    2. A second **Form XObject** is then created which the target page uses to invoke the display. This object has the following properties:
+
+        a. */BBox* equals the */CropBox* of the source page (or *clip*).
+        b. */Matrix* represents the mapping of */BBox* to the target rectangle.
+        c. */XObject* references the previous XObject via the fixed name *fullpage*.
+        d. The stream of this object contains exactly one fixed statement: */fullpage Do*.
+
+    3. The :data:`resources` and :data:`contents` objects of the target page are now modified as follows.
+
+        a. Add an entry to the */XObject* dictionary of */Resources* with the name *fzFrm<n>* (with n chosen such that this entry is unique on the page).
+        b. Depending on *overlay*, prepend or append a new object to the page's */Contents* array, containing the statement *q /fzFrm<n> Do Q*.
+
+
+.. _RedirectMessages:
+
+Redirecting Error and Warning Messages
+--------------------------------------------
+Since MuPDF version 1.16 error and warning messages can be redirected via an official plugin.
+
+PyMuPDF will put error messages to *sys.stderr* prefixed with the string "mupdf:". Warnings are internally stored and can be accessed via *fitz.TOOLS.mupdf_warnings()*. There also is a function to empty this store.
+
+
+.. rubric:: Footnotes
+
+.. [#f1] MuPDF supports "deep-copying" objects between PDF documents. To avoid duplicate data in the target, it uses so-called "graftmaps", like a form of scratchpad: for each object to be copied, its :data:`xref` number is looked up in the graftmap. If found, copying is skipped. Otherwise, the new :data:`xref` is recorded and the copy takes place. PyMuPDF makes use of this technique in two places so far: :meth:`Document.insertPDF` and :meth:`Page.showPDFpage`. This process is fast and very efficient, because it prevents multiple copies of typically large and frequently referenced data, like images and fonts. However, you may still want to consider using garbage collection (option 4) in any of the following cases:
+
+    1. The target PDF is not new / empty: grafting does not check for resource types that already existed (e.g. images, fonts) in the target document
+    2. Using :meth:`Page.showPDFpage` for more than one source document: each grafting occurs **within one source** PDF only, not across multiple.
diff --git a/docs/changes.rst b/docs/changes.rst

new file mode 100644 (file)

index 0000000..9e0e7e1
--- /dev/null
+++ b/docs/changes.rst
@@ -0,0 +1,751 @@
+Change Logs
+===============
+
+Changes in Version 1.17.4
+---------------------------
+* **Fixed** issue `#561 <https://github.com/pymupdf/PyMuPDF/issues/561>`_. Handling of more than 10 :ref:`Font` objects on one page should now work correctly.
+* **Fixed** issue `#562 <https://github.com/pymupdf/PyMuPDF/issues/562>`_. Annotation pixmaps are no longer derived from the page pixmap, thus avoiding unintended inclusion of page content.
+* **Fixed** issue `#559 <https://github.com/pymupdf/PyMuPDF/issues/559>`_. This **MuPDF** bug is being temporarily fixed with a pre-version of MuPDF's next release.
+* **Added** utility function :meth:`repair_mono_font` for correcting displayed character spacing for some mono-spaced fonts.
+* **Added** utility method :meth:`Document.need_appearances` for fine-controlling Form PDF behavior. Addresses issue `#563 <https://github.com/pymupdf/PyMuPDF/issues/563>`_.
+* **Added** utility function :meth:`sRGB_to_pdf` to recover the PDF color triple for a given color integer in sRGB format.
+* **Added** utility function :meth:`sRGB_to_rgb` to recover the (R, G, B) color triple for a given color integer in sRGB format.
+* **Added** utility function :meth:`make_table` which delivers table cells for a given rectangle and desired numbers of columns and rows.
+* **Added** support for optional fonts in repository `pymupdf-fonts <https://github.com/pymupdf/pymupdf-fonts>`_.
+
+Changes in Version 1.17.3
+---------------------------
+* **Fixed** an undocumented issue, which prevented fully cleaning a PDF page when using :meth:`Page.cleanContents`.
+* **Fixed** issue `#540 <https://github.com/pymupdf/PyMuPDF/issues/540>`_. Text extraction for EPUB should again work correctly.
+* **Fixed** issue `#548 <https://github.com/pymupdf/PyMuPDF/issues/548>`_. Documentation now includes ``LINK_NAMED``.
+* **Added** new parameter to control start of text in :meth:`TextWriter.fillTextbox`. Implements `#549 <https://github.com/pymupdf/PyMuPDF/issues/549>`_.
+* **Changed** documentation of :meth:`Page.addRedactAnnot` to explain the usage of non-builtin fonts.
+
+Changes in Version 1.17.2
+---------------------------
+* **Fixed** issue `#533 <https://github.com/pymupdf/PyMuPDF/issues/533>`_.
+* **Added** options to modify 'Redact' annotation appearance. Implements `#535 <https://github.com/pymupdf/PyMuPDF/issues/535>`_.
+
+
+Changes in Version 1.17.1
+---------------------------
+* **Fixed** issue `#520 <https://github.com/pymupdf/PyMuPDF/issues/520>`_.
+* **Fixed** issue `#525 <https://github.com/pymupdf/PyMuPDF/issues/525>`_. Vertices for 'Ink' annots should now be correct.
+* **Fixed** issue `#524 <https://github.com/pymupdf/PyMuPDF/issues/524>`_. It is now possible to query and set rotation for applicable annotation types.
+
+Also significantly improved inline documentation for better support of interactive help.
+
+Changes in Version 1.17.0
+---------------------------
+This version is based on MuPDF v1.17. Following are highlights of new and changed features:
+
+* **Added** extended language support for annotations and widgets: a mixture of Latin, Greece, Russian, Chinese, Japanese and Korean characters can now be used in 'FreeText' annotations and text widgets. No special arrangement is required to use it.
+
+* Faster page access is implemented for documents supporting a "chapter" structure. This applies to EPUB documents currently. This comes with several new :ref:`Document` methods and changes for :meth:`Document.loadPage` and the "indexed" page access *doc[n]*: In addition to specifying a page number as before, a tuple *(chaper, pno)* can be specified to identify the desired page.
+
+* **Changed:** Improved support of redaction annotations: images overlapped by redactions are **permanantly modified** by erasing the overlap areas. Also links are removed if overlapped by redactions. This is now fully in sync with PDF specifications.
+
+Other changes:
+
+* **Changed** :meth:`TextWriter.writeText` to support the *"morph"* parameter.
+* **Added** methods :meth:`Rect.morph`, :meth:`IRect.morph`, and :meth:`Quad.morph`, which return a new :ref:`Quad`.
+* **Changed** :meth:`Page.addFreetextAnnot` to support text alignment via a new *"align"* parameter.
+* **Fixed** issue `#508 <https://github.com/pymupdf/PyMuPDF/issues/508>`_. Improved image rectangle calculation to hopefully deliver correct values in most if not all cases.
+* **Fixed** issue `#502 <https://github.com/pymupdf/PyMuPDF/issues/502>`_.
+* **Fixed** issue `#500 <https://github.com/pymupdf/PyMuPDF/issues/500>`_. :meth:`Document.convertToPDF` should no longer cause memory leaks.
+* **Fixed** issue `#496 <https://github.com/pymupdf/PyMuPDF/issues/496>`_. Annotations and widgets / fields are now added or modified using the coordinates of the **unrotated page**. This behavior is now in sync with other methods modifying PDF pages.
+* **Added** :attr:`Page.rotationMatrix` and :attr:`Page.derotationMatrix` to support coordinate transformations between the rotated and the original versions of a PDF page.
+
+Potential code breaking changes:
+
+* The private method ``Page._getTransformation()`` has been removed. Use the public :attr:`Page.transformationMattrix` instead.
+
+
+Changes in Version 1.16.18
+---------------------------
+This version introduces several new features around PDF text output. The motivation is to simplify this task, while at the same time offering extending features.
+
+One major achievement is using MuPDF's capabilities to dynamically choosing fallback fonts whenever a character cannot be found in the current one. This seemlessly works for Base-14 fonts in combination with CJK fonts (China, Japan, Korea). So a text may contain **any combination of characters** from the Latin, Greek, Russian, Chinese, Japanese and Korean languages.
+
+* **Fixed** issue `#493 <https://github.com/pymupdf/PyMuPDF/issues/493>`_. ``Pixmap(doc, xref)`` should now again correctly resemble the loaded image object.
+* **Fixed** issue `#488 <https://github.com/pymupdf/PyMuPDF/issues/488>`_. Widget names are now modifyable.
+* **Added** new class :ref:`Font` which represents a font.
+* **Added** new class :ref:`TextWriter` which serves as a container for text to be written on a page.
+* **Added** :meth:`Page.writeText` to write one or more :ref:`TextWriter` objects to the page.
+
+
+Changes in Version 1.16.17
+---------------------------
+
+* **Fixed** issue `#479 <https://github.com/pymupdf/PyMuPDF/issues/479>`_. PyMuPDF should now more correctly report image resolutions. This applies to both, images (either from images files or extracted from PDF documents) and pixmaps created from images.
+* **Added** :meth:`Pixmap.setResolution` which sets the image resolution in x and y directions.
+
+Changes in Version 1.16.16
+---------------------------
+
+* **Fixed** issue `#477 <https://github.com/pymupdf/PyMuPDF/issues/477>`_.
+* **Fixed** issue `#476 <https://github.com/pymupdf/PyMuPDF/issues/476>`_.
+* **Changed** annotation line end symbol coloring and fixed an error coloring the interior of 'Polyline' /'Polygon' annotations.
+
+Changes in Version 1.16.14
+---------------------------
+
+* **Changed** text marker annotations to accept parameters beyond just quadrilaterals such that now **text lines between two given points can be marked**.
+
+* **Added** :meth:`Document.scrub` which **removes potentially sensitive data** from a PDF. Implements `#453 <https://github.com/pymupdf/PyMuPDF/issues/453>`_.
+
+* **Added** :meth:`Annot.blendMode` which returns the **blend mode** of annotations.
+
+* **Added** :meth:`Annot.setBlendMode` to set the annotation's blend mode. This resolves issue `#416 <https://github.com/pymupdf/PyMuPDF/issues/416>`_.
+* **Changed** :meth:`Annot.update` to accept additional parameters for setting blend mode and opacity.
+* **Added** advanced graphics features to **control the anti-aliasing values**, :meth:`Tools.set_aa_level`. Resolves `#467 <https://github.com/pymupdf/PyMuPDF/issues/467>`_
+
+* **Fixed** issue `#474 <https://github.com/pymupdf/PyMuPDF/issues/474>`_.
+* **Fixed** issue `#466 <https://github.com/pymupdf/PyMuPDF/issues/466>`_.
+
+
+
+Changes in Version 1.16.13
+---------------------------
+
+* **Added** :meth:`Document.getPageXObjectList` which returns a list of **Form XObjects** of the page.
+* **Added** :meth:`Page.setMediaBox` for changing the physical PDF page size.
+* **Added** :ref:`Page` methods which have been internal before: :meth:`Page.cleanContents` (= :meth:`Page._cleanContents`), :meth:`Page.getContents` (= :meth:`Page._getContents`), :meth:`Page.getTransformation` (= :meth:`Page._getTransformation`).
+
+
+
+Changes in Version 1.16.12
+---------------------------
+* **Fixed** issue `#447 <https://github.com/pymupdf/PyMuPDF/issues/447>`_
+* **Fixed** issue `#461 <https://github.com/pymupdf/PyMuPDF/issues/461>`_.
+* **Fixed** issue `#397 <https://github.com/pymupdf/PyMuPDF/issues/397>`_.
+* **Fixed** issue `#463 <https://github.com/pymupdf/PyMuPDF/issues/463>`_.
+* **Added** JavaScript support to PDF form fields, thereby fixing `#454 <https://github.com/pymupdf/PyMuPDF/issues/454>`_.
+* **Added** a new annotation method :meth:`Annot.delete_responses`, which removes 'Popup' and response annotations referring to the current one. Mainly serves data protection purposes.
+* **Added** a new form field method :meth:`Widget.reset`, which resets the field value to its default.
+* **Changed** and extended handling of redactions: images and XObjects are removed if *contained* in a redaction rectangle. Any partial only overlaps will just be covered by the redaction background color. Now an *overlay* text can be specified to be inserted in the rectangle area to **take the place the deleted original** text. This resolves `#434 <https://github.com/pymupdf/PyMuPDF/issues/434>`_.
+
+Changes in Version 1.16.11
+---------------------------
+* **Added** Support for redaction annotations via method :meth:`Page.addRedactAnnot` and :meth:`Page.apply_redactions`.
+* **Fixed** issue #426 ("PolygonAnnotation in 1.16.10 version").
+* **Fixed** documentation only issues `#443 <https://github.com/pymupdf/PyMuPDF/issues/443>`_ and `#444 <https://github.com/pymupdf/PyMuPDF/issues/444>`_.
+
+Changes in Version 1.16.10
+---------------------------
+* **Fixed** issue #421 ("annot.setRect(rect) has no effect on text Annotation")
+* **Fixed** issue #417 ("Strange behavior for page.deleteAnnot on 1.16.9 compare to 1.13.20")
+* **Fixed** issue #415 ("Annot.setOpacity throws mupdf warnings")
+* **Changed** all "add annotation / widget" methods to store a unique name in the */NM* PDF key.
+* **Changed** :meth:`Annot.setInfo` to also accept direct parameters in addition to a dictionary.
+* **Changed** :attr:`Annot.info` to now also show the annotation's unique id (*/NM* PDF key) if present.
+* **Added** :meth:`Page.annot_names` which returns a list of all annotation names (*/NM* keys).
+* **Added** :meth:`Page.load_annot` which loads an annotation given its unique id (*/NM* key).
+* **Added** :meth:`Document.reload_page` which provides a new copy of a page after finishing any pending updates to it.
+
+
+Changes in Version 1.16.9
+---------------------------
+* **Fixed** #412 ("Feature Request: Allow controlling whether TOC entries should be collapsed")
+* **Fixed** #411 ("Seg Fault with page.firstWidget")
+* **Fixed** #407 ("Annot.setOpacity trouble")
+* **Changed** methods :meth:`Annot.setBorder`, :meth:`Annot.setColors`, :meth:`Link.setBorder`, and :meth:`Link.setColors` to also accept direct parameters, and not just cumbersome dictionaries.
+
+Changes in Version 1.16.8
+---------------------------
+* **Added** several new methods to the :ref:`Document` class, which make dealing with PDF low-level structures easier. I also decided to provide them as "normal" methods (as opposed to private ones starting with an underscore "_"). These are :meth:`Document.xrefObject`, :meth:`Document.xrefStream`, :meth:`Document.xrefStreamRaw`, :meth:`Document.PDFTrailer`, :meth:`Document.PDFCatalog`, :meth:`Document.metadataXML`, :meth:`Document.updateObject`, :meth:`Document.updateStream`.
+* **Added** :meth:`Tools.mupdf_disply_errors` which sets the display of mupdf errors on *sys.stderr*.
+* **Added** a commandline facility. This a major new feature: you can now invoke several utility functions via *"python -m fitz ..."*. It should obsolete the need for many of the most trivial scripts. Please refer to :ref:`Module`.
+
+
+Changes in Version 1.16.7
+---------------------------
+Minor changes to better synchronize the binary image streams of :ref:`TextPage` image blocks and :meth:`Document.extractImage` images.
+
+* **Fixed** issue #394 ("PyMuPDF Segfaults when using TOOLS.mupdf_warnings()").
+* **Changed** redirection of MuPDF error messages: apart from writing them to Python *sys.stderr*, they are now also stored with the MuPDF warnings.
+* **Changed** :meth:`Tools.mupdf_warnings` to automatically empty the store (if not deactivated via a parameter).
+* **Changed** :meth:`Page.getImageBbox` to return an **infinite rectangle** if the image could not be located on the page -- instead of raising an exception.
+
+
+Changes in Version 1.16.6
+---------------------------
+* **Fixed** issue #390 ("Incomplete deletion of annotations").
+* **Changed** :meth:`Page.searchFor` / :meth:`Document.searchPageFor` to also support the *flags* parameter, which controls the data included in a :ref:`TextPage`.
+* **Changed** :meth:`Document.getPageImageList`, :meth:`Document.getPageFontList` and their :ref:`Page` counterparts to support a new parameter *full*. If true, the returned items will contain the :data:`xref` of the *Form XObject* where the font or image is referenced.
+
+Changes in Version 1.16.5
+---------------------------
+More performance improvements for text extraction.
+
+* **Fixed** second part of issue #381 (see item in v1.16.4).
+* **Added** :meth:`Page.getTextPage`, so it is no longer required to create an intermediate display list for text extractions. Page level wrappers for text extraction and text searching are now based on this, which should improve performance by ca. 5%.
+
+Changes in Version 1.16.4
+---------------------------
+
+* **Fixed** issue #381 ("TextPage.extractDICT ... failed ... after upgrading ... to 1.16.3")
+* **Added** method :meth:`Document.pages` which delivers a generator iterator over a page range.
+* **Added** method :meth:`Page.links` which delivers a generator iterator over the links of a page.
+* **Added** method :meth:`Page.annots` which delivers a generator iterator over the annotations of a page.
+* **Added** method :meth:`Page.widgets` which delivers a generator iterator over the form fields of a page.
+* **Changed** :attr:`Document.isFormPDF` to now contain the number of widgets, and *False* if not a PDF or this number is zero.
+
+
+Changes in Version 1.16.3
+---------------------------
+Minor changes compared to version 1.16.2. The code of the "dict" and "rawdict" variants of :meth:`Page.getText` has been ported to C which has greatly improved their performance. This improvement is mostly noticeable with text-oriented documents, where they now should execute almost two times faster.
+
+* **Fixed** issue #369 ("mupdf: cmsCreateTransform failed") by removing ICC colorspace support.
+* **Changed** :meth:`Page.getText` to accept additional keywords "blocks" and "words". These will deliver the results of :meth:`Page.getTextBlocks` and :meth:`Page.getTextWords`, respectively. So all text extraction methods are now available via a uniform API. Correspondingly, there are now new methods :meth:`TextPage.extractBLOCKS` and :meth:`TextPage.extractWords`.
+* **Changed** :meth:`Page.getText` to default bit indicator *TEXT_INHIBIT_SPACES* to **off**. Insertion of additional spaces is **not suppressed** by default.
+
+Changes in Version 1.16.2
+---------------------------
+* **Changed** text extraction methods of :ref:`Page` to allow detail control of the amount of extracted data.
+* **Added** :meth:`planishLine` which maps a given line (defined as a pair of points) to the x-axis.
+* **Fixed** an issue (w/o Github number) which brought down the interpreter when encountering certain non-UTF-8 encodable characters while using :meth:`Page.getText` with te "dict" option.
+* **Fixed** issue #362 ("Memory Leak with getText('rawDICT')").
+
+Changes in Version 1.16.1
+---------------------------
+* **Added** property :attr:`Quad.isConvex` which checks whether a line is contained in the quad if it connects two points of it.
+* **Changed** :meth:`Document.insertPDF` to now allow dropping or including links and annotations independently during the copy. Fixes issue #352 ("Corrupt PDF data and ..."), which seemed to intermittently occur when using the method for some problematic PDF files.
+* **Fixed** a bug which, in matrix division using the syntax *"m1/m2"*, caused matrix *"m1"* to be **replaced** by the result instead of delivering a new matrix.
+* **Fixed** issue #354 ("SyntaxWarning with Python 3.8"). We now always use *"=="* for literals (instead of the *"is"* Python keyword).
+* **Fixed** issue #353 ("mupdf version check"), to no longer refuse the import when there are only patch level deviations from MuPDF.
+
+
+
+Changes in Version 1.16.0
+---------------------------
+This major new version of MuPDF comes with several nice new or changed features. Some of them imply programming API changes, however. This is a synopsis of what has changed:
+
+* PDF document encryption and decryption is now **fully supported**. This includes setting **permissions**, **passwords** (user and owner passwords) and the desired encryption method.
+* In response to the new encryption features, PyMuPDF returns an integer (ie. a combination of bits) for document permissions, and no longer a dictionary.
+* Redirection of MuPDF errors and warnings is now natively supported. PyMuPDF redirects error messages from MuPDF to *sys.stderr* and no longer buffers them. Warnings continue to be buffered and will not be displayed. Functions exist to access and reset the warnings buffer.
+* Annotations are now **only supported for PDF**.
+* Annotations and widgets (form fields) are now **separate object chains** on a page (although widgets technically still **are** PDF annotations). This means, that you will **never encounter widgets** when using :attr:`Page.firstAnnot` or :meth:`Annot.next`. You must use :attr:`Page.firstWidget` and :meth:`Widget.next` to access form fields.
+* As part of MuPDF's changes regarding widgets, only the following four fonts are supported, when **adding** or **changing** form fields: **Courier, Helvetica, Times-Roman** and **ZapfDingBats**.
+
+List of change details:
+
+* **Added** :meth:`Document.can_save_incrementally` which checks conditions that are preventing use of option *incremental=True* of :meth:`Document.save`.
+* **Added** :attr:`Page.firstWidget` which points to the first field on a page.
+* **Added** :meth:`Page.getImageBbox` which returns the rectangle occupied by an image shown on the page.
+* **Added** :meth:`Annot.setName` which lets you change the (icon) name field.
+* **Added** outputting the text color in :meth:`Page.getText`: the *"dict"*, *"rawdict"* and *"xml"* options now also show the color in sRGB format.
+* **Changed** :attr:`Document.permissions` to now contain an integer of bool indicators -- was a dictionary before.
+* **Changed** :meth:`Document.save`, :meth:`Document.write`, which now fully support password-based decryption and encryption of PDF files.
+* **Changed the names of all Python constants** related to annotations and widgets. Please make sure to consult the **Constants and Enumerations** chapter if your script is dealing with these two classes. This decision goes back to the dropped support for non-PDF annotations. The **old names** (starting with "ANNOT_*" or "WIDGET_*") will be available as deprecated synonyms.
+* **Changed** font support for widgets: only *Cour* (Courier), *Helv* (Helvetica, default), *TiRo* (Times-Roman) and *ZaDb* (ZapfDingBats) are accepted when **adding or changing** form fields. Only the plain versions are possible -- not their italic or bold variations. **Reading** widgets, however will show its original font.
+* **Changed** the name of the warnings buffer to :meth:`Tools.mupdf_warnings` and the function to empty this buffer is now called :meth:`Tools.reset_mupdf_warnings`.
+* **Changed** :meth:`Page.getPixmap`, :meth:`Document.getPagePixmap`: a new bool argument *annots* can now be used to **suppress the rendering of annotations** on the page.
+* **Changed** :meth:`Page.addFileAnnot` and :meth:`Page.addTextAnnot` to enable setting an icon.
+* **Removed** widget-related methods and attributes from the :ref:`Annot` object.
+* **Removed** :ref:`Document` attributes *openErrCode*, *openErrMsg*, and :ref:`Tools` attributes / methods *stderr*, *reset_stderr*, *stdout*, and *reset_stdout*.
+* **Removed** **thirdparty zlib** dependency in PyMuPDF: there are now compression functions available in MuPDF. Source installers of PyMuPDF may now omit this extra installation step.
+
+No version published for MuPDF v1.15.0
+------------------------------------------------------
+
+Changes in Version 1.14.20 / 1.14.21
+-------------------------------------
+* **Changed** text marker annotations to support multiple rectangles / quadrilaterals. This fixes issue #341 ("Question : How to addhighlight so that a string spread across more than a line is covered by one highlight?") and similar (#285).
+* **Fixed** issue #331 ("Importing PyMuPDF changes warning filtering behaviour globally").
+
+
+Changes in Version 1.14.19
+---------------------------
+* **Fixed** issue #319 ("InsertText function error when use custom font").
+* **Added** new method :meth:`Document.getSigFlags` which returns information on whether a PDF is signed. Resolves issue #326 ("How to detect signature in a form pdf?").
+
+
+Changes in Version 1.14.17
+---------------------------
+* **Added** :meth:`Document.fullcopyPage` to make full page copies within a PDF (not just copied references as :meth:`Document.copyPage` does).
+* **Changed** :meth:`Page.getPixmap`, :meth:`Document.getPagePixmap` now use *alpha=False* as default.
+* **Changed** text extraction: the span dictionary now (again) contains its rectangle under the *bbox* key.
+* **Changed** :meth:`Document.movePage` and :meth:`Document.copyPage` to use direct functions instead of wrapping :meth:`Document.select` -- similar to :meth:`Document.deletePage` in v1.14.16.
+
+Changes in Version 1.14.16
+---------------------------
+* **Changed** :ref:`Document` methods around PDF */EmbeddedFiles* to no longer use MuPDF's "portfolio" functions. That support will be dropped in MuPDF v1.15 -- therefore another solution was required.
+* **Changed** :meth:`Document.embeddedFileCount` to be a function (was an attribute).
+* **Added** new method :meth:`Document.embeddedFileNames` which returns a list of names of embedded files.
+* **Changed** :meth:`Document.deletePage` and :meth:`Document.deletePageRange` to internally no longer use :meth:`Document.select`, but instead use functions to perform the deletion directly. As it has turned out, the :meth:`Document.select` method yields invalid outline trees (tables of content) for very complex PDFs and sophisticated use of annotations.
+
+
+Changes in Version 1.14.15
+---------------------------
+* **Fixed** issues #301 ("Line cap and Line join"), #300 ("How to draw a shape without outlines") and #298 ("utils.updateRect exception"). These bugs pertain to drawing shapes with PyMuPDF. Drawing shapes without any border is fully supported. Line cap styles and line line join style are now differentiated and support all possible PDF values (0, 1, 2) instead of just being a bool. The previous parameter *roundCap* is deprecated in favor of *lineCap* and *lineJoin* and will be deleted in the next release.
+* **Fixed** issue #290 ("Memory Leak with getText('rawDICT')"). This bug caused memory not being (completely) freed after invoking the "dict", "rawdict" and "json" versions of :meth:`Page.getText`.
+
+
+Changes in Version 1.14.14
+---------------------------
+* **Added** new low-level function :meth:`ImageProperties` to determine a number of characteristics for an image.
+* **Added** new low-level function :meth:`Document.isStream`, which checks whether an object is of stream type.
+* **Changed** low-level functions :meth:`Document._getXrefString` and :meth:`Document._getTrailerString` now by default return object definitions in a formatted form which makes parsing easy.
+
+Changes in Version 1.14.13
+---------------------------
+* **Changed** methods working with binary input: while ever supporting bytes and bytearray objects, they now also accept *io.BytesIO* input, using their *getvalue()* method. This pertains to document creation, embedded files, FileAttachment annotations, pixmap creation and others. Fixes issue #274 ("Segfault when using BytesIO as a stream for insertImage").
+* **Fixed** issue #278 ("Is insertImage(keep_proportion=True) broken?"). Images are now correctly presented when keeping aspect ratio.
+
+
+Changes in Version 1.14.12
+---------------------------
+* **Changed** the draw methods of :ref:`Page` and :ref:`Shape` to support not only RGB, but also GRAY and CMYK colorspaces. This solves issue #270 ("Is there a way to use CMYK color to draw shapes?"). This change also applies to text insertion methods of :ref:`Shape`, resp. :ref:`Page`.
+* **Fixed** issue #269 ("AttributeError in Document.insertPage()"), which occurred when using :meth:`Document.insertPage` with text insertion.
+
+
+Changes in Version 1.14.11
+---------------------------
+* **Changed** :meth:`Page.showPDFpage` to always position the source rectangle centered in the target. This method now also supports **rotation by arbitrary angles**. The argument *reuse_xref* has been deprecated: prevention of duplicates is now **handled internally**.
+* **Changed** :meth:`Page.insertImage` to support rotated display of the image and keeping the aspect ratio. Only rotations by multiples of 90 degrees are supported here.
+* **Fixed** issue #265 ("TypeError: insertText() got an unexpected keyword argument 'idx'"). This issue only occurred when using :meth:`Document.insertPage` with also inserting text.
+
+Changes in Version 1.14.10
+---------------------------
+* **Changed** :meth:`Page.showPDFpage` to support rotation of the source rectangle. Fixes #261 ("Cannot rotate insterted pages").
+* **Fixed** a bug in :meth:`Page.insertImage` which prevented insertion of multiple images provided as streams.
+
+
+Changes in Version 1.14.9
+---------------------------
+* **Added** new low-level method :meth:`Document._getTrailerString`, which returns the trailer object of a PDF. This is much like :meth:`Document._getXrefString` except that the PDF trailer has no / needs no :data:`xref` to identify it.
+* **Added** new parameters for text insertion methods. You can now set stroke and fill colors of glyphs (text characters) independently, as well as the thickness of the glyph border. A new parameter *render_mode* controls the use of these colors, and whether the text should be visible at all.
+* **Fixed** issue #258 ("Copying image streams to new PDF without size increase"): For JPX images embedded in a PDF, :meth:`Document.extractImage` will now return them in their original format. Previously, the MuPDF base library was used, which returns them in PNG format (entailing a massive size increase).
+* **Fixed** issue #259 ("Morphing text to fit inside rect"). Clarified use of :meth:`getTextlength` and removed extra line breaks for long words.
+
+Changes in Version 1.14.8
+---------------------------
+* **Added** :meth:`Pixmap.setRect` to change the pixel values in a rectangle. This is also an alternative to setting the color of a complete pixmap (:meth:`Pixmap.clearWith`).
+* **Fixed** an image extraction issue with JBIG2 (monochrome) encoded PDF images. The issue occurred in :meth:`Page.getText` (parameters "dict" and "rawdict") and in :meth:`Document.extractImage` methods.
+* **Fixed** an issue with not correctly clearing a non-alpha :ref:`Pixmap` (:meth:`Pixmap.clearWith`).
+* **Fixed** an issue with not correctly inverting colors of a non-alpha :ref:`Pixmap` (:meth:`Pixmap.invertIRect`).
+
+Changes in Version 1.14.7
+---------------------------
+* **Added** :meth:`Pixmap.setPixel` to change one pixel value.
+* **Added** documentation for image conversion in the :ref:`FAQ`.
+* **Added** new function :meth:`getTextlength` to determine the string length for a given font.
+* **Added** Postscript image output (changed :meth:`Pixmap.writeImage` and :meth:`Pixmap.getImageData`).
+* **Changed** :meth:`Pixmap.writeImage` and :meth:`Pixmap.getImageData` to ensure valid combinations of colorspace, alpha and output format.
+* **Changed** :meth:`Pixmap.writeImage`: the desired format is now inferred from the filename.
+* **Changed** FreeText annotations can now have a transparent background - see :meth:`Annot.update`.
+
+Changes in Version 1.14.5
+---------------------------
+* **Changed:** :ref:`Shape` methods now strictly use the transformation matrix of the :ref:`Page` -- instead of "manually" calculating locations.
+* **Added** method :meth:`Pixmap.pixel` which returns the pixel value (a list) for given pixel coordinates.
+* **Added** method :meth:`Pixmap.getImageData` which returns a bytes object representing the pixmap in a variety of formats. Previously, this could be done for PNG outputs only (:meth:`Pixmap.getPNGData`).
+* **Changed:** output of methods :meth:`Pixmap.writeImage` and (the new) :meth:`Pixmap.getImageData` may now also be PSD (Adobe Photoshop Document).
+* **Added** method :meth:`Shape.drawQuad` which draws a :ref:`Quad`. This actually is a shorthand for a :meth:`Shape.drawPolyline` with the edges of the quad.
+* **Changed** method :meth:`Shape.drawOval`: the argument can now be **either** a rectangle (:data:`rect_like`) **or** a quadrilateral (:data:`quad_like`).
+
+Changes in Version 1.14.4
+---------------------------
+* **Fixes** issue #239 "Annotation coordinate consistency".
+
+
+Changes in Version 1.14.3
+---------------------------
+This patch version contains minor bug fixes and CJK font output support.
+
+* **Added** support for the four CJK fonts as PyMuPDF generated text output. This pertains to methods :meth:`Page.insertFont`, :meth:`Shape.insertText`, :meth:`Shape.insertTextbox`, and corresponding :ref:`Page` methods. The new fonts are available under "reserved" fontnames "china-t" (traditional Chinese), "china-s" (simplified Chinese), "japan" (Japanese), and "korea" (Korean).
+* **Added** full support for the built-in fonts 'Symbol' and 'Zapfdingbats'.
+* **Changed:** The 14 standard fonts can now each be referenced by a 4-letter abbreviation.
+
+Changes in Version 1.14.1
+---------------------------
+This patch version contains minor performance improvements.
+
+* **Added** support for :ref:`Document` filenames given as *pathlib* object by using the Python *str()* function.
+
+
+Changes in Version 1.14.0
+---------------------------
+To support MuPDF v1.14.0, massive changes were required in PyMuPDF -- most of them purely technical, with little visibility to developers. But there are also quite a lot of interesting new and improved features. Following are the details:
+
+* **Added** "ink" annotation.
+* **Added** "rubber stamp" annotation.
+* **Added** "squiggly" text marker annotation.
+* **Added** new class :ref:`Quad` (quadrilateral or tetragon) -- which represents a general four-sided shape in the plane. The special subtype of rectangular, non-empty tetragons is used in text marker annotations and as returned objects in text search methods.
+* **Added** a new option "decrypt" to :meth:`Document.save` and :meth:`Document.write`. Now you can **keep encryption** when saving a password protected PDF.
+* **Added** suppression and redirection of unsolicited messages issued by the underlying C-library MuPDF. Consult :ref:`RedirectMessages` for details.
+* **Changed:** Changes to annotations now **always require** :meth:`Annot.update` to become effective.
+* **Changed** free text annotations to support the full Latin character set and range of appearance options.
+* **Changed** text searching, :meth:`Page.searchFor`, to optionally return :ref:`Quad` instead :ref:`Rect` objects surrounding each search hit.
+* **Changed** plain text output: we now add a *\n* to each line if it does not itself end with this character.
+* **Fixed** issue 211 ("Something wrong in the doc").
+* **Fixed** issue 213 ("Rewritten outline is displayed only by mupdf-based applications").
+* **Fixed** issue 214 ("PDF decryption GONE!").
+* **Fixed** issue 215 ("Formatting of links added with pyMuPDF").
+* **Fixed** issue 217 ("extraction through json is failing for my pdf").
+
+Behind the curtain, we have changed the implementation of geometry objects: they now purely exist in Python and no longer have "shadow" twins on the C-level (in MuPDF). This has improved processing speed in that area by more than a factor of two.
+
+Because of the same reason, most methods involving geometry parameters now also accept the corresponding Python sequence. For example, in method *"page.showPDFpage(rect, ...)"* parameter *rect* may now be any :data:`rect_like` sequence.
+
+We also invested considerable effort to further extend and improve the :ref:`FAQ` chapter.
+
+
+Changes in Version 1.13.19
+---------------------------
+This version contains some technical / performance improvements and bug fixes.
+
+* **Changed** memory management: for Python 3 builds, Python memory management is exclusively used across all C-level code (i.e. no more native *malloc()* in MuPDF code or PyMuPDF interface code). This leads to improved memory usage profiles and also some runtime improvements: we have seen > 2% shorter runtimes for text extractions and pixmap creations (on Windows machines only to date).
+* **Fixed** an error occurring in Python 2.7, which crashed the interpreter when using :meth:`TextPage.extractRAWDICT` (= *Page.getText("rawdict")*).
+* **Fixed** an error occurring in Python 2.7, when creating link destinations.
+* **Extended** the :ref:`FAQ` chapter with more examples.
+
+Changes in Version 1.13.18
+---------------------------
+* **Added** method :meth:`TextPage.extractRAWDICT`, and a corresponding new string parameter "rawdict" to method :meth:`Page.getText`. It extracts text and images from a page in Python *dict* form like :meth:`TextPage.extractDICT`, but with the detail level of :meth:`TextPage.extractXML`, which is position information down to each single character.
+
+Changes in Version 1.13.17
+---------------------------
+* **Fixed** an error that intermittently caused an exception in :meth:`Page.showPDFpage`, when pages from many different source PDFs were shown.
+* **Changed** method :meth:`Document.extractImage` to now return more meta information about the extracted imgage. Also, its performance has been greatly improved. Several demo scripts have been changed to make use of this method.
+* **Changed** method :meth:`Document._getXrefStream` to now return *None* if the object is no stream and no longer raise an exception if otherwise.
+* **Added** method :meth:`Document._deleteObject` which deletes a PDF object identified by its :data:`xref`. Only to be used by the experienced PDF expert.
+* **Added** a method :meth:`PaperRect` which returns a :ref:`Rect` for a supplied paper format string. Example: *fitz.PaperRect("letter") = fitz.Rect(0.0, 0.0, 612.0, 792.0)*.
+* **Added** a :ref:`FAQ` chapter to this document.
+
+Changes in Version 1.13.16
+---------------------------
+* **Added** support for correctly setting transparency (opacity) for certain annotation types.
+* **Added** a tool property (:attr:`Tools.fitz_config`) showing the configuration of this PyMuPDF version.
+* **Fixed** issue #193 ('insertText(overlay=False) gives "cannot resize a buffer with shared storage" error') by avoiding read-only buffers.
+
+Changes in Version 1.13.15
+---------------------------
+* **Fixed** issue #189 ("cannot find builtin CJK font"), so we are supporting builtin CJK fonts now (CJK = China, Japan, Korea). This should lead to correctly generated pixmaps for documents using these languages. This change has consequences for our binary file size: it will now range between 8 and 10 MB, depending on the OS.
+* **Fixed** issue #191 ("Jupyter notebook kernel dies after ca. 40 pages"), which occurred when modifying the contents of an annotation.
+
+Changes in Version 1.13.14
+---------------------------
+This patch version contains several improvements, mainly for annotations.
+
+* **Changed** :attr:`Annot.lineEnds` is now a list of two integers representing the line end symbols. Previously was a *dict* of strings.
+* **Added** support of line end symbols for applicable annotations. PyMuPDF now can generate these annotations including the line end symbols.
+* **Added** :meth:`Annot.setLineEnds` adds line end symbols to applicable annotation types ('Line', 'PolyLine', 'Polygon').
+* **Changed** technical implementation of :meth:`Page.insertImage` and :meth:`Page.showPDFpage`: they now create there own contents objects, thereby avoiding changes of potentially large streams with consequential compression / decompression efforts and high change volumes with incremental updates.
+
+Changes in Version 1.13.13
+---------------------------
+This patch version contains several improvements for embedded files and file attachment annotations.
+
+* **Added** :meth:`Document.embeddedFileUpd` which allows changing **file content and metadata** of an embedded file. It supersedes the old method :meth:`Document.embeddedFileSetInfo` (which will be deleted in a future version). Content is automatically compressed and metadata may be unicode.
+* **Changed** :meth:`Document.embeddedFileAdd` to now automatically compress file content. Accompanying metadata can now be unicode (had to be ASCII in the past).
+* **Changed** :meth:`Document.embeddedFileDel` to now automatically delete **all entries** having the supplied identifying name. The return code is now an integer count of the removed entries (was *None* previously).
+* **Changed** embedded file methods to now also accept or show the PDF unicode filename as additional parameter *ufilename*.
+* **Added** :meth:`Page.addFileAnnot` which adds a new file attachment annotation.
+* **Changed** :meth:`Annot.fileUpd` (file attachment annot) to now also accept the PDF unicode *ufilename* parameter. The description parameter *desc* correctly works with unicode. Furthermore, **all** parameters are optional, so metadata may be changed without also replacing the file content.
+* **Changed** :meth:`Annot.fileInfo` (file attachment annot) to now also show the PDF unicode filename as parameter *ufilename*.
+* **Fixed** issue #180 ("page.getText(output='dict') return invalid bbox") to now also work for vertical text.
+* **Fixed** issue #185 ("Can't render the annotations created by PyMuPDF"). The issue's cause was the minimalistic MuPDF approach when creating annotations. Several annotation types have no */AP* ("appearance") object when created by MuPDF functions. MuPDF, SumatraPDF and hence also PyMuPDF cannot render annotations without such an object. This fix now ensures, that an appearance object is always created together with the annotation itself. We still do not support line end styles.
+
+Changes in Version 1.13.12
+---------------------------
+* **Fixed** issue #180 ("page.getText(output='dict') return invalid bbox"). Note that this is a circumvention of an MuPDF error, which generates zero-height character rectangles in some cases. When this happens, this fix ensures a bbox height of at least fontsize.
+* **Changed** for ListBox and ComboBox widgets, the attribute list of selectable values has been renamed to :attr:`Widget.choice_values`.
+* **Changed** when adding widgets, any missing of the :ref:`Base-14-Fonts` is automatically added to the PDF. Widget text fonts can now also be chosen from existing widget fonts. Any specified field values are now honored and lead to a field with a preset value.
+* **Added** :meth:`Annot.updateWidget` which allows changing existing form fields -- including the field value.
+
+Changes in Version 1.13.11
+---------------------------
+While the preceeding patch subversions only contained various fixes, this version again introduces major new features:
+
+* **Added** basic support for PDF widget annotations. You can now add PDF form fields of types Text, CheckBox, ListBox and ComboBox. Where necessary, the PDF is tranformed to a Form PDF with the first added widget.
+* **Fixed** issues #176 ("wrong file embedding"), #177 ("segment fault when invoking page.getText()")and #179 ("Segmentation fault using page.getLinks() on encrypted PDF").
+
+
+Changes in Version 1.13.7
+--------------------------
+* **Added** support of variable page sizes for reflowable documents (e-books, HTML, etc.): new parameters *rect* and *fontsize* in :ref:`Document` creation (open), and as a separate method :meth:`Document.layout`.
+* **Added** :ref:`Annot` creation of many annotations types: sticky notes, free text, circle, rectangle, line, polygon, polyline and text markers.
+* **Added** support of annotation transparency (:attr:`Annot.opacity`, :meth:`Annot.setOpacity`).
+* **Changed** :attr:`Annot.vertices`: point coordinates are now grouped as pairs of floats (no longer as separate floats).
+* **Changed** annotation colors dictionary: the two keys are now named *"stroke"* (formerly *"common"*) and *"fill"*.
+* **Added** :attr:`Document.isDirty` which is *True* if a PDF has been changed in this session. Reset to *False* on each :meth:`Document.save` or :meth:`Document.write`.
+
+Changes in Version 1.13.6
+--------------------------
+* Fix #173: for memory-resident documents, ensure the stream object will not be garbage-collected by Python before document is closed.
+
+Changes in Version 1.13.5
+--------------------------
+* New low-level method :meth:`Page._setContents` defines an object given by its :data:`xref` to serve as the :data:`contents` object.
+* Changed and extended PDF form field support: the attribute *widget_text* has been renamed to :attr:`Annot.widget_value`. Values of all form field types (except signatures) are now supported. A new attribute :attr:`Annot.widget_choices` contains the selectable values of listboxes and comboboxes. All these attributes now contain *None* if no value is present.
+
+Changes in Version 1.13.4
+--------------------------
+* :meth:`Document.convertToPDF` now supports page ranges, reverted page sequences and page rotation. If the document already is a PDF, an exception is raised.
+* Fixed a bug (introduced with v1.13.0) that prevented :meth:`Page.insertImage` for transparent images.
+
+Changes in Version 1.13.3
+--------------------------
+Introduces a way to convert **any MuPDF supported document** to a PDF. If you ever wanted PDF versions of your XPS, EPUB, CBZ or FB2 files -- here is a way to do this.
+
+* :meth:`Document.convertToPDF` returns a Python *bytes* object in PDF format. Can be opened like normal in PyMuPDF, or be written to disk with the *".pdf"* extension.
+
+Changes in Version 1.13.2
+--------------------------
+The major enhancement is PDF form field support. Form fields are annotations of type *(19, 'Widget')*. There is a new document method to check whether a PDF is a form. The :ref:`Annot` class has new properties describing field details.
+
+* :attr:`Document.isFormPDF` is true if object type */AcroForm* and at least one form field exists.
+* :attr:`Annot.widget_type`, :attr:`Annot.widget_text` and :attr:`Annot.widget_name` contain the details of a form field (i.e. a "Widget" annotation).
+
+Changes in Version 1.13.1
+--------------------------
+* :meth:`TextPage.extractDICT` is a new method to extract the contents of a document page (text and images). All document types are supported as with the other :ref:`TextPage` *extract*()* methods. The returned object is a dictionary of nested lists and other dictionaries, and **exactly equal** to the JSON-deserialization of the old :meth:`TextPage.extractJSON`. The difference is that the result is created directly -- no JSON module is used. Because the user needs no JSON module to interpet the information, it should be easier to use, and also have a better performance, because it contains images in their original **binary format** -- they need not be base64-decoded.
+* :meth:`Page.getText` correspondingly supports the new parameter value *"dict"* to invoke the above method.
+* :meth:`TextPage.extractJSON` (resp. *Page.getText("json")*) is still supported for convenience, but its use is expected to decline.
+
+Changes in Version 1.13.0
+--------------------------
+This version is based on MuPDF v1.13.0. This release is "primarily a bug fix release".
+
+In PyMuPDF, we are also doing some bug fixes while introducing minor enhancements. There only very minimal changes to the user's API.
+
+* :ref:`Document` construction is more flexible: the new *filetype* parameter allows setting the document type. If specified, any extension in the filename will be ignored. More completely addresses `issue #156 <https://github.com/pymupdf/PyMuPDF/issues/156>`_. As part of this, the documentation has been reworked.
+
+* Changes to :ref:`Pixmap` constructors:
+    - Colorspace conversion no longer allows dropping the alpha channel: source and target **alpha will now always be the same**. We have seen exceptions and even interpreter crashes when using *alpha = 0*.
+    - As a replacement, the simple pixmap copy lets you choose the target alpha.
+
+* :meth:`Document.save` again offers the full garbage collection range 0 thru 4. Because of a bug in :data:`xref` maintenance, we had to temporarily enforce *garbage > 1*. Finally resolves `issue #148 <https://github.com/pymupdf/PyMuPDF/issues/148>`_.
+
+* :meth:`Document.save` now offers to "prettify" PDF source via an additional argument.
+* :meth:`Page.insertImage` has the additional *stream* \-parameter, specifying a memory area holding an image.
+
+* Issue with garbled PNGs on Linux systems has been resolved (`"Problem writing PNG" #133) <https://github.com/pymupdf/PyMuPDF/issues/133>`_.
+
+
+Changes in Version 1.12.4
+--------------------------
+This is an extension of 1.12.3.
+
+* Fix of `issue #147 <https://github.com/pymupdf/PyMuPDF/issues/147>`_: methods :meth:`Document.getPageFontlist` and :meth:`Document.getPageImagelist` now also show fonts and images contained in :data:`resources` nested via "Form XObjects".
+* Temporary fix of `issue #148 <https://github.com/pymupdf/PyMuPDF/issues/148>`_: Saving to new PDF files will now automatically use *garbage = 2* if a lower value is given. Final fix is to be expected with MuPDF's next version. At that point we will remove this circumvention.
+* Preventive fix of illegally using stencil / image mask pixmaps in some methods.
+* Method :meth:`Document.getPageFontlist` now includes the encoding name for each font in the list.
+* Method :meth:`Document.getPageImagelist` now includes the decode method name for each image in the list.
+
+Changes in Version 1.12.3
+--------------------------
+This is an extension of 1.12.2.
+
+* Many functions now return *None* instead of *0*, if the result has no other meaning than just indicating successful execution (:meth:`Document.close`, :meth:`Document.save`, :meth:`Document.select`, :meth:`Pixmap.writePNG` and many others).
+
+Changes in Version 1.12.2
+--------------------------
+This is an extension of 1.12.1.
+
+* Method :meth:`Page.showPDFpage` now accepts the new *clip* argument. This specifies an area of the source page to which the display should be restricted.
+
+* New :attr:`Page.CropBox` and :attr:`Page.MediaBox` have been included for convenience.
+
+
+Changes in Version 1.12.1
+--------------------------
+This is an extension of version 1.12.0.
+
+* New method :meth:`Page.showPDFpage` displays another's PDF page. This is a **vector** image and therefore remains precise across zooming. Both involved documents must be PDF.
+
+* New method :meth:`Page.getSVGimage` creates an SVG image from the page. In contrast to the raster image of a pixmap, this is a vector image format. The return is a unicode text string, which can be saved in a *.svg* file.
+
+* Method :meth:`Page.getTextBlocks` now accepts an additional bool parameter "images". If set to true (default is false), image blocks (metadata only) are included in the produced list and thus allow detecting areas with rendered images.
+
+* Minor bug fixes.
+
+* "text" result of :meth:`Page.getText` concatenates all lines within a block using a single space character. MuPDF's original uses "\\n" instead, producing a rather ragged output.
+
+* New properties of :ref:`Page` objects :attr:`Page.MediaBoxSize` and :attr:`Page.CropBoxPosition` provide more information about a page's dimensions. For non-PDF files (and for most PDF files, too) these will be equal to :attr:`Page.rect.bottom_right`, resp. :attr:`Page.rect.top_left`. For example, class :ref:`Shape` makes use of them to correctly position its items.
+
+Changes in Version 1.12.0
+--------------------------
+This version is based on and requires MuPDF v1.12.0. The new MuPDF version contains quite a number of changes -- most of them around text extraction. Some of the changes impact the programmer's API.
+
+* :meth:`Outline.saveText` and :meth:`Outline.saveXML` have been deleted without replacement. You probably haven't used them much anyway. But if you are looking for a replacement: the output of :meth:`Document.getToC` can easily be used to produce something equivalent.
+
+* Class *TextSheet* does no longer exist.
+
+* Text "spans" (one of the hierarchy levels of :ref:`TextPage`) no longer contain positioning information (i.e. no "bbox" key). Instead, spans now provide the font information for its text. This impacts our JSON output variant.
+
+* HTML output has improved very much: it now creates valid documents which can be displayed by browsers to produce a similar view as the original document.
+
+* There is a new output format XHTML, which provides text and images in a browser-readable format. The difference to HTML output is, that no effort is made to reproduce the original layout.
+
+* All output formats of :meth:`Page.getText` now support creating complete, valid documents, by wrapping them with appropriate header and trailer information. If you are interested in using the HTML output, please make sure to read :ref:`HTMLQuality`.
+
+* To support finding text positions, we have added special methods that don't need detours like :meth:`TextPage.extractJSON` or :meth:`TextPage.extractXML`: use :meth:`Page.getTextBlocks` or resp. :meth:`Page.getTextWords` to create lists of text blocks or resp. words, which are accompanied by their rectangles. This should be much faster than the standard text extraction methods and also avoids using additional packages for interpreting their output.
+
+
+Changes in Version 1.11.2
+--------------------------
+This is an extension of v1.11.1.
+
+* New :meth:`Page.insertFont` creates a PDF */Font* object and returns its object number.
+
+* New :meth:`Document.extractFont` extracts the content of an embedded font given its object number.
+
+* Methods **FontList(...)** items no longer contain the PDF generation number. This value never had any significance. Instead, the font file extension is included (e.g. "pfa" for a "PostScript Font for ASCII"), which is more valuable information.
+
+* Fonts other than "simple fonts" (Type1) are now also supported.
+
+* New options to change :ref:`Pixmap` size:
+
+    * Method :meth:`Pixmap.shrink` reduces the pixmap proportionally in place.
+
+    * A new :ref:`Pixmap` copy constructor allows scaling via setting target width and height.
+
+
+Changes in Version 1.11.1
+--------------------------------
+This is an extension of v1.11.0.
+
+* New class *Shape*. It facilitates and extends the creation of image shapes on PDF pages. It contains multiple methods for creating elementary shapes like lines, rectangles or circles, which can be combined into more complex ones and be given common properties like line width or colors. Combined shapes are handled as a unit and e.g. be "morphed" together. The class can accumulate multiple complex shapes and put them all in the page's foreground or background -- thus also reducing the number of updates to the page's :data:`contents` object.
+
+* All *Page* draw methods now use the new *Shape* class.
+
+* Text insertion methods *insertText()* and *insertTextBox()* now support morphing in addition to text rotation. They have become part of the *Shape* class and thus allow text to be freely combined with graphics.
+
+* A new *Pixmap* constructor allows creating pixmap copies with an added alpha channel. A new method also allows directly manipulating alpha values.
+
+* Binary algebraic operations with geometry objects (matrices, rectangles and points) now generally also support lists or tuples as the second operand. You can add a tuple *(x, y)* of numbers to a :ref:`Point`. In this context, such sequences are called ":data:`point_like`" (resp. :data:`matrix_like`, :data:`rect_like`).
+
+* Geometry objects now fully support in-place operators. For example, *p /= m* replaces point p with *p * 1/m* for a number, or *p * ~m* for a :data:`matrix_like` object *m*. Similarly, if *r* is a rectangle, then *r |= (3, 4)* is the new rectangle that also includes *fitz.Point(3, 4)*, and *r &= (1, 2, 3, 4)* is its intersection with *fitz.Rect(1, 2, 3, 4)*.
+
+Changes in Version 1.11.0
+--------------------------------
+This version is based on and requires MuPDF v1.11.
+
+Though MuPDF has declared it as being mostly a bug fix version, one major new feature is indeed contained: support of embedded files -- also called portfolios or collections. We have extended PyMuPDF functionality to embrace this up to an extent just a little beyond the *mutool* utility as follows.
+
+* The *Document* class now support embedded files with several new methods and one new property:
+
+    - *embeddedFileInfo()* returns metadata information about an entry in the list of embedded files. This is more than *mutool* currently provides: it shows all the information that was used to embed the file (not just the entry's name).
+    - *embeddedFileGet()* retrieves the (decompressed) content of an entry into a *bytes* buffer.
+    - *embeddedFileAdd(...)* inserts new content into the PDF portfolio. We (in contrast to *mutool*) **restrict** this to entries with a **new name** (no duplicate names allowed).
+    - *embeddedFileDel(...)* deletes an entry from the portfolio (function not offered in MuPDF).
+    - *embeddedFileSetInfo()* -- changes filename or description of an embedded file.
+    - *embeddedFileCount* -- contains the number of embedded files.
+
+* Several enhancements deal with streamlining geometry objects. These are not connected to the new MuPDF version and most of them are also reflected in PyMuPDF v1.10.0. Among them are new properties to identify the corners of rectangles by name (e.g. *Rect.bottom_right*) and new methods to deal with set-theoretic questions like *Rect.contains(x)* or *IRect.intersects(x)*. Special effort focussed on supporting more "Pythonic" language constructs: *if x in rect ...* is equivalent to *rect.contains(x)*.
+
+* The :ref:`Rect` chapter now has more background on empty amd infinite rectangles and how we handle them. The handling itself was also updated for more consistency in this area.
+
+* We have started basic support for **generation** of PDF content:
+
+    - *Document.insertPage()* adds a new page into a PDF, optionally containing some text.
+    - *Page.insertImage()* places a new image on a PDF page.
+    - *Page.insertText()* puts new text on an existing page
+
+* For **FileAttachment** annotations, content and name of the attached file can extracted and changed.
+
+Changes in Version 1.10.0
+-------------------------------
+
+MuPDF v1.10 Impact
+~~~~~~~~~~~~~~~~~~~~~~~~
+MuPDF version 1.10 has a significant impact on our bindings. Some of the changes also affect the API -- in other words, **you** as a PyMuPDF user.
+
+* Link destination information has been reduced. Several properties of the *linkDest* class no longer contain valuable information. In fact, this class as a whole has been deleted from MuPDF's library and we in PyMuPDF only maintain it to provide compatibilty to existing code.
+
+* In an effort to minimize memory requirements, several improvements have been built into MuPDF v1.10:
+
+    - A new *config.h* file can be used to de-select unwanted features in the C base code. Using this feature we have been able to reduce the size of our binary *_fitz.o* / *_fitz.pyd* by about 50% (from 9 MB to 4.5 MB). When UPX-ing this, the size goes even further down to a very handy 2.3 MB.
+
+    - The alpha (transparency) channel for pixmaps is now optional. Letting alpha default to *False* significantly reduces pixmap sizes (by 20% -- CMYK, 25% -- RGB, 50% -- GRAY). Many *Pixmap* constructors therefore now accept an *alpha* boolean to control inclusion of this channel. Other pixmap constructors (e.g. those for file and image input) create pixmaps with no alpha alltogether. On the downside, save methods for pixmaps no longer accept a *savealpha* option: this channel will always be saved when present. To minimize code breaks, we have left this parameter in the call patterns -- it will just be ignored.
+
+* *DisplayList* and *TextPage* class constructors now **require the mediabox** of the page they are referring to (i.e. the *page.bound()* rectangle). There is no way to construct this information from other sources, therefore a source code change cannot be avoided in these cases. We assume however, that not many users are actually employing these rather low level classes explixitely. So the impact of that change should be minor.
+
+Other Changes compared to Version 1.9.3
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+* The new :ref:`Document` method *write()* writes an opened PDF to memory (as opposed to a file, like *save()* does).
+* An annotation can now be scaled and moved around on its page. This is done by modifying its rectangle.
+* Annotations can now be deleted. :ref:`Page` contains the new method *deleteAnnot()*.
+* Various annotation attributes can now be modified, e.g. content, dates, title (= author), border, colors.
+* Method *Document.insertPDF()* now also copies annotations of source pages.
+* The *Pages* class has been deleted. As documents can now be accessed with page numbers as indices (like *doc[n] = doc.loadPage(n)*), and document object can be used as iterators, the benefit of this class was too low to maintain it. See the following comments.
+* *loadPage(n)* / *doc[n]* now accept arbitrary integers to specify a page number, as long as *n < pageCount*. So, e.g. *doc[-500]* is always valid and will load page *(-500) % pageCount*.
+* A document can now also be used as an iterator like this: *for page in doc: ...<do something with "page"> ...*. This will yield all pages of *doc* as *page*.
+* The :ref:`Pixmap` method *getSize()* has been replaced with property *size*. As before *Pixmap.size == len(Pixmap)* is true.
+* In response to transparency (alpha) being optional, several new parameters and properties have been added to :ref:`Pixmap` and :ref:`Colorspace` classes to support determining their characteristics.
+* The :ref:`Page` class now contains new properties *firstAnnot* and *firstLink* to provide starting points to the respective class chains, where *firstLink* is just a mnemonic synonym to method *loadLinks()* which continues to exist. Similarly, the new property *rect* is a synonym for method *bound()*, which also continues to exist.
+* :ref:`Pixmap` methods *samplesRGB()* and *samplesAlpha()* have been deleted because pixmaps can now be created without transparency.
+* :ref:`Rect` now has a property *irect* which is a synonym of method *round()*. Likewise, :ref:`IRect` now has property *rect* to deliver a :ref:`Rect` which has the same coordinates as floats values.
+* Document has the new method *searchPageFor()* to search for a text string. It works exactly like the corresponding *Page.searchFor()* with page number as additional parameter.
+
+
+Changes in Version 1.9.3
+----------------------------------
+This version is also based on MuPDF v1.9a. Changes compared to version 1.9.2:
+
+* As a major enhancement, annotations are now supported in a similar way as links. Annotations can be displayed (as pixmaps) and their properties can be accessed.
+* In addition to the document *select()* method, some simpler methods can now be used to manipulate a PDF:
+
+    - *copyPage()* copies a page within a document.
+    - *movePage()* is similar, but deletes the original.
+    - *deletePage()* deletes a page
+    - *deletePageRange()* deletes a page range
+
+* *rotation* or *setRotation()* access or change a PDF page's rotation, respectively.
+* Available but undocumented before, :ref:`IRect`, :ref:`Rect`, :ref:`Point` and :ref:`Matrix` support the *len()* method and their coordinate properties can be accessed via indices, e.g. *IRect.x1 == IRect[2]*.
+* For convenience, documents now support simple indexing: *doc.loadPage(n) == doc[n]*. The index may however be in range *-pageCount < n < pageCount*, such that *doc[-1]* is the last page of the document.
+
+Changes in Version 1.9.2
+------------------------------
+This version is also based on MuPDF v1.9a. Changes compared to version 1.9.1:
+
+* *fitz.open()* (no parameters) creates a new empty **PDF** document, i.e. if saved afterwards, it must be given a *.pdf* extension.
+* :ref:`Document` now accepts all of the following formats (*Document* and *open* are synonyms):
+
+  - *open()*,
+  - *open(filename)* (equivalent to *open(filename, None)*),
+  - *open(filetype, area)* (equivalent to *open(filetype, stream = area)*).
+
+  Type of memory area *stream* may be *bytes* or *bytearray*. Thus, e.g. *area = open("file.pdf", "rb").read()* may be used directly (without first converting it to bytearray).
+* New method *Document.insertPDF()* (PDFs only) inserts a range of pages from another PDF.
+* *Document* objects doc now support the *len()* function: *len(doc) == doc.pageCount*.
+* New method *Document.getPageImageList()* creates a list of images used on a page.
+* New method *Document.getPageFontList()* creates a list of fonts referenced by a page.
+* New pixmap constructor *fitz.Pixmap(doc, xref)* creates a pixmap based on an opened PDF document and an :data:`xref` number of the image.
+* New pixmap constructor *fitz.Pixmap(cspace, spix)* creates a pixmap as a copy of another one *spix* with the colorspace converted to *cspace*. This works for all colorspace combinations.
+* Pixmap constructor *fitz.Pixmap(colorspace, width, height, samples)* now allows *samples* to also be *bytes*, not only *bytearray*.
+
+
+Changes in Version 1.9.1
+----------------------------
+This version of PyMuPDF is based on MuPDF library source code version 1.9a published on April 21, 2016.
+
+Please have a look at MuPDF's website to see which changes and enhancements are contained herein.
+
+Changes in version 1.9.1 compared to version 1.8.0 are the following:
+
+* New methods *getRectArea()* for both *fitz.Rect* and *fitz.IRect*
+* Pixmaps can now be created directly from files using the new constructor *fitz.Pixmap(filename)*.
+* The Pixmap constructor *fitz.Pixmap(image)* has been extended accordingly.
+* *fitz.Rect* can now be created with all possible combinations of points and coordinates.
+* PyMuPDF classes and methods now all contain  __doc__ strings,  most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
+* A new document method of *getPermits()* returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.
+* The identity matrix *fitz.Identity* is now **immutable**.
+* The new document method *select(list)* removes all pages from a document that are not contained in the list. Pages can also be duplicated and re-arranged.
+* Various improvements and new members in our demo and examples collections. Perhaps most prominently: *PDF_display* now supports scrolling with the mouse wheel, and there is a new example program *wxTableExtract* which allows to graphically identify and extract table data in documents.
+* *fitz.open()* is now an alias of *fitz.Document()*.
+* New pixmap method *getPNGData()* which will return a bytearray formatted as a PNG image of the pixmap.
+* New pixmap method *samplesRGB()* providing a *samples* version with alpha bytes stripped off (RGB colorspaces only).
+* New pixmap method *samplesAlpha()* providing the alpha bytes only of the *samples* area.
+* New iterator *fitz.Pages(doc)* over a document's set of pages.
+* New matrix methods *invert()* (calculate inverted matrix), *concat()* (calculate matrix product), *preTranslate()* (perform a shift operation).
+* New *IRect* methods *intersect()* (intersection with another rectangle), *translate()* (perform a shift operation).
+* New *Rect* methods *intersect()* (intersection with another rectangle), *transform()* (transformation with a matrix), *includePoint()* (enlarge rectangle to also contain a point), *includeRect()* (enlarge rectangle to also contain another one).
+* Documented *Point.transform()* (transform a point with a matrix).
+* *Matrix*, *IRect*, *Rect* and *Point* classes now support compact, algebraic formulations for manipulating such objects.
+* Incremental saves for changes are possible now using the call pattern *doc.save(doc.name, incremental=True)*.
+* A PDF's metadata can now be deleted, set or changed by document method *setMetadata()*. Supports incremental saves.
+* A PDF's bookmarks (or table of contents) can now be deleted, set or changed with the entries of a list using document method *setToC(list)*. Supports incremental saves.
diff --git a/docs/classes.rst b/docs/classes.rst

new file mode 100644 (file)

index 0000000..1347379
--- /dev/null
+++ b/docs/classes.rst
@@ -0,0 +1,28 @@
+============
+Classes
+============
+
+.. toctree::
+   :maxdepth: 2
+
+   annot
+   colorspace
+   displaylist
+   document
+   font
+   identity
+   irect
+   link
+   linkdest
+   matrix
+   outline
+   page
+   pixmap
+   point
+   quad
+   rect
+   shape
+   textpage
+   textwriter
+   tools
+   widget
diff --git a/docs/colors.rst b/docs/colors.rst

new file mode 100644 (file)

index 0000000..510ae69
--- /dev/null
+++ b/docs/colors.rst
@@ -0,0 +1,43 @@
+.. _ColorDatabase:
+
+================
+Color Database
+================
+Since the introduction of methods involving colors (like :meth:`Page.drawCircle`), a requirement may be to have access to predefined colors.
+
+The fabulous GUI package `wxPython <https://wxpython.org/>`_ has a database of over 540 predefined RGB colors, which are given more or less memorizable names. Among them are not only standard names like "green" or "blue", but also "turquoise", "skyblue", and 100 (not only 50 ...) shades of "gray", etc.
+
+We have taken the liberty to copy this database (a list of tuples) modified into PyMuPDF and make its colors available as PDF compatible float triples: for wxPython's *("WHITE", 255, 255, 255)* we return *(1, 1, 1)*, which can be directly used in *color* and *fill* parameters. We also accept any mixed case of "wHiTe" to find a color.
+
+Function *getColor()*
+------------------------
+As the color database may not be needed very often, one additional import statement seems acceptable to get access to it::
+
+    >>> # "getColor" is the only method you really need
+    >>> from fitz.utils import getColor
+    >>> getColor("aliceblue")
+    (0.9411764705882353, 0.9725490196078431, 1.0)
+    >>> #
+    >>> # to get a list of all existing names
+    >>> from fitz.utils import getColorList
+    >>> cl = getColorList()
+    >>> cl
+    ['ALICEBLUE', 'ANTIQUEWHITE', 'ANTIQUEWHITE1', 'ANTIQUEWHITE2', 'ANTIQUEWHITE3',
+    'ANTIQUEWHITE4', 'AQUAMARINE', 'AQUAMARINE1'] ...
+    >>> #
+    >>> # to see the full integer color coding
+    >>> from fitz.utils import getColorInfoList
+    >>> il = getColorInfoList()
+    >>> il
+    [('ALICEBLUE', 240, 248, 255), ('ANTIQUEWHITE', 250, 235, 215),
+    ('ANTIQUEWHITE1', 255, 239, 219), ('ANTIQUEWHITE2', 238, 223, 204),
+    ('ANTIQUEWHITE3', 205, 192, 176), ('ANTIQUEWHITE4', 139, 131, 120),
+    ('AQUAMARINE', 127, 255, 212), ('AQUAMARINE1', 127, 255, 212)] ...
+
+
+Printing the Color Database
+----------------------------
+If you want to actually see how the many available colors look like, use scripts `colordbRGB.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/colordbRGB.py>`_ or `colordbHSV.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/colordbHSV.py>`_ in the examples directory. They create PDFs (already existing in the same directory) with all these colors. Their only difference is sorting order: one takes the RGB values, the other one the Hue-Saturation-Values as sort criteria.
+This is a screen print of what these files look like.
+
+.. image:: images/img-colordb.png
diff --git a/docs/colorspace.rst b/docs/colorspace.rst

new file mode 100644 (file)

index 0000000..891154f
--- /dev/null
+++ b/docs/colorspace.rst
@@ -0,0 +1,39 @@
+.. _Colorspace:
+
+================
+Colorspace
+================
+
+Represents the color space of a :ref:`Pixmap`.
+
+
+**Class API**
+
+.. class:: Colorspace
+
+   .. method:: __init__(self, n)
+
+      Constructor
+
+      :arg int n: A number identifying the colorspace. Possible values are :data:`CS_RGB`, :data:`CS_GRAY` and :data:`CS_CMYK`.
+
+   .. attribute:: name
+
+      The name identifying the colorspace. Example: *fitz.csCMYK.name = 'DeviceCMYK'*.
+
+      :type: str
+
+   .. attribute:: n
+
+      The number of bytes required to define the color of one pixel. Example: *fitz.csCMYK.n == 4*.
+
+      :type: int
+
+
+    **Predefined Colorspaces**
+
+    For saving some typing effort, there exist predefined colorspace objects for the three available cases.
+
+    * :data:`csRGB`  = *fitz.Colorspace(fitz.CS_RGB)*
+    * :data:`csGRAY` = *fitz.Colorspace(fitz.CS_GRAY)*
+    * :data:`csCMYK` = *fitz.Colorspace(fitz.CS_CMYK)*
diff --git a/docs/conf.py b/docs/conf.py

new file mode 100644 (file)

index 0000000..88966e4
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1,248 @@
+# -*- coding: utf-8 -*-
+#
+import sys
+import os
+import sphinx_rtd_theme
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+# sys.path.insert(0, os.path.abspath('.'))
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+# needs_sphinx = "3.1"
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.autodoc",
+    # "sphinx.ext.todo",
+    "sphinx.ext.coverage",
+    "sphinx.ext.ifconfig",
+    # "sphinx.ext.imgmath",
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# The suffix of source filenames.
+# source_suffix = ".rst"
+
+# The encoding of source files.
+# source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = "index"
+
+# General information about the project.
+project = "PyMuPDF"
+copyright = "2015-2020, Jorj X. McKie"
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The full version, including alpha/beta/rc tags.
+release = "1.17.4"
+
+# The short X.Y version
+version = release
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+# language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+# today = ''
+# Else, today_fmt is used as the format for a strftime call.
+# today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = ["_build"]
+
+# The reST default role (used for this markup: `text`) to use for all
+# documents.
+default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = "sphinx"
+
+# A list of ignored prefixes for module index sorting.
+modindex_common_prefix = []
+
+# If true, keep warnings as "system message" paragraphs in the built documents.
+keep_warnings = False
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+# html_theme = "agogo"
+# html_theme = "sphinxdoc"
+# html_theme = "python_docs_theme"
+html_theme = "sphinx_rtd_theme"
+# html_theme = "classic"
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+html_theme_options = {
+    # "root_name": "",
+    # "root_url": "",
+    # "root_icon": "pymupdf.ico",
+    # "sidebarbgcolor": "gray",
+}
+
+# Add any paths that contain custom themes here, relative to this directory.
+# html_theme_path = []
+# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
+# The name for this set of Sphinx documents.  If None, it defaults to
+# "<project> v<release> documentation".
+# html_title = None
+
+# A shorter title for the navigation bar.  Default is the same as html_title.
+# html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+# html_logo = "images/img-pymupdf.jpg"
+
+# The name of an image file (within the static path) to use as favicon of the
+# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+html_favicon = "Pymupdf.ico"
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ["_static"]
+
+# Add any extra paths that contain custom files (such as robots.txt or
+# .htaccess) here, relative to this directory. These files are copied
+# directly to the root of the documentation.
+# html_extra_path = []
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+html_last_updated_fmt = "%d. %b %Y"
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+# html_use_smartypants = False
+
+# Custom sidebar templates, maps document names to template names.
+# html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+html_additional_pages = {}
+
+# If false, no module index is generated.
+html_domain_indices = True
+
+# If false, no index is generated.
+html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+html_split_index = True
+
+# If true, links to the reST sources are added to the pages.
+html_show_sourcelink = True
+html_sourcelink_suffix = ".rst"
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it.  The value of this option must be the
+# base URL from which the finished HTML is served.
+# html_use_opensearch = "https://pymupdf.readthedocs.io/en/latest"
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+# html_file_suffix = ".html"
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = "PyMuPDF"
+
+
+# -- Options for LaTeX output ---------------------------------------------
+latex_elements = {
+    # "fontpkg": r"\usepackage[sfdefault]{ClearSans} \usepackage[T1]{fontenc}"
+}
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    ("index", "PyMuPDF.tex", u"PyMuPDF Documentation", u"Jorj X. McKie", "manual")
+]
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+latex_logo = "images/img-pymupdf.jpg"
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+# latex_use_parts = False
+
+# If true, show page references after internal links.
+latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+# latex_show_urls = True
+# latex_use_xindy = True
+# Documents to append as an appendix to all manuals.
+# latex_appendices = []
+
+# If false, no module index is generated.
+latex_domain_indices = True
+
+# -- Options for PDF output --------------------------------------------------
+# Grouping the document tree into PDF files. List of tuples
+# (source start file, target name, title, author).
+
+pdf_documents = [("index", "PyMuPDF", "PyMuPDF Manual", "Jorj McKie")]
+
+# A comma-separated list of custom stylesheets. Example:
+pdf_stylesheets = ["sphinx", "bahnschrift"]
+
+# Create a compressed PDF
+pdf_compressed = True
+
+# A colon-separated list of folders to search for fonts. Example:
+# pdf_font_path=['/usr/share/fonts', '/usr/share/texmf-dist/fonts/']
+
+# Language to be used for hyphenation support
+pdf_language = "en_US"
+
+# If false, no index is generated.
+pdf_use_index = True
+
+# If false, no modindex is generated.
+pdf_use_modindex = True
+
+# If false, no coverpage is generated.
+pdf_use_coverpage = True
+
+pdf_break_level = 2
+
+pdf_verbosity = 0
+pdf_invariant = True
diff --git a/docs/coop_low.rst b/docs/coop_low.rst

new file mode 100644 (file)

index 0000000..02146c9
--- /dev/null
+++ b/docs/coop_low.rst
@@ -0,0 +1,71 @@
+
+.. _cooperation:
+
+===============================================================
+Working together: DisplayList and TextPage
+===============================================================
+Here are some instructions on how to use these classes together.
+
+In some situations, performance improvements may be achievable, when you fall back to the detail level explained here.
+
+Create a DisplayList
+---------------------
+A :ref:`DisplayList` represents an interpreted document page. Methods for pixmap creation, text extraction and text search are  -- behind the curtain -- all using the page's display list to perform their tasks. If a page must be rendered several times (e.g. because of changed zoom levels), or if text search and text extraction should both be performed, overhead can be saved, if the display list is created only once and then used for all other tasks.
+
+>>> dl = page.getDisplayList()              # create the display list
+
+You can also create display lists for many pages "on stack" (in a list), may be during document open, during idling times, or you store it when a page is visited for the first time (e.g. in GUI scripts).
+
+Note, that for everything what follows, only the display list is needed -- the corresponding :ref:`Page` object could have been deleted.
+
+Generate Pixmap
+------------------
+The following creates a Pixmap from a :ref:`DisplayList`. Parameters are the same as for :meth:`Page.getPixmap`.
+
+>>> pix = dl.getPixmap()                    # create the page's pixmap
+
+The execution time of this statement may be up to 50% shorter than that of :meth:`Page.getPixMap`.
+
+Perform Text Search
+---------------------
+With the display list from above, we can also search for text.
+
+For this we need to create a :ref:`TextPage`.
+
+>>> tp = dl.getTextPage()                    # display list from above
+>>> rlist = tp.search("needle")              # look up "needle" locations
+>>> for r in rlist:                          # work with the found locations, e.g.
+        pix.invertIRect(r.irect)             # invert colors in the rectangles
+
+Extract Text
+----------------
+With the same :ref:`TextPage` object from above, we can now immediately use any or all of the 5 text extraction methods.
+
+.. note:: Above, we have created our text page without argument. This leads to a default argument of 3 (ligatures and white-space are preserved), IAW images will **not** be extracted -- see below.
+
+>>> txt  = tp.extractText()                  # plain text format
+>>> json = tp.extractJSON()                  # json format
+>>> html = tp.extractHTML()                  # HTML format
+>>> xml  = tp.extractXML()                   # XML format
+>>> xml  = tp.extractXHTML()                 # XHTML format
+
+Further Performance improvements
+---------------------------------
+Pixmap
+~~~~~~~
+As explained in the :ref:`Page` chapter:
+
+If you do not need transparency set *alpha = 0* when creating pixmaps. This will save 25% memory (if RGB, the most common case) and possibly 5% execution time (depending on the GUI software).
+
+TextPage
+~~~~~~~~~
+If you do not need images extracted alongside the text of a page, you can set the following option:
+
+>>> flags = fitz.TEXT_PRESERVE_LIGATURES | fitz.TEXT_PRESERVE_WHITESPACE
+>>> tp = dl.getTextPage(flags)
+
+This will save ca. 25% overall execution time for the HTML, XHTML and JSON text extractions and **hugely** reduce the amount of storage (both, memory and disk space) if the document is graphics oriented.
+
+If you however do need images, use a value of 7 for flags:
+
+>>> flags = fitz.TEXT_PRESERVE_LIGATURES | fitz.TEXT_PRESERVE_WHITESPACE | fitz.TEXT_PRESERVE_IMAGES
diff --git a/docs/device.rst b/docs/device.rst

new file mode 100644 (file)

index 0000000..7c4ed3b
--- /dev/null
+++ b/docs/device.rst
@@ -0,0 +1,33 @@
+.. _Device:
+
+================
+Device
+================
+
+The different format handlers (pdf, xps, etc.) interpret pages to a "device". Devices are the basis for everything that can be done with a page: rendering, text extraction and searching. The device type is determined by the selected construction method.
+
+**Class API**
+
+.. class:: Device
+
+   .. method:: __init__(self, object, clip)
+
+      Constructor for either a pixel map or a display list device.
+
+      :arg object: either a *Pixmap* or  a *DisplayList*.
+      :type object: :ref:`Pixmap` or :ref:`DisplayList`
+
+      :arg clip: An optional `IRect` for *Pixmap* devices to restrict rendering to a certain area of the page. If the complete page is required, specify *None*. For display list devices, this parameter must be omitted.
+      :type clip: :ref:`IRect`
+
+   .. method:: __init__(self, textpage, flags=0)
+
+      Constructor for a text page device.
+
+      :arg textpage: *TextPage* object
+      :type textpage: :ref:`TextPage`
+
+      :arg int flags: control the way how text is parsed into the text page. Currently 3 options can be coded into this parameter, see :ref:`TextPreserve`. To set these options use something like *flags=0 | TEXT_PRESERVE_LIGATURES | ...*.
+
+.. note:: In higher level code (:meth:`Page.getText`, :meth:`Document.getPageText`), the following decisions for creating text devices have been implemented: (1) *TEXT_PRESERVE_LIGATURES* and *TEXT_PRESERVE_WHITESPACES* are always set, (2) *TEXT_PRESERVE_IMAGES* is set for JSON and HTML, otherwise off.
+
diff --git a/docs/displaylist.rst b/docs/displaylist.rst

new file mode 100644 (file)

index 0000000..b02be11
--- /dev/null
+++ b/docs/displaylist.rst
@@ -0,0 +1,92 @@
+.. _DisplayList:
+
+================
+DisplayList
+================
+
+DisplayList is a list containing drawing commands (text, images, etc.). The intent is two-fold:
+
+1. as a caching-mechanism to reduce parsing of a page
+2. as a data structure in multi-threading setups, where one thread parses the page and another one renders pages. This aspect is currently not supported by PyMuPDF.
+
+A display list is populated with objects from a page, usually by executing :meth:`Page.getDisplayList`. There also exists an independent constructor.
+
+"Replay" the list (once or many times) by invoking one of its methods :meth:`~DisplayList.run`, :meth:`~DisplayList.getPixmap` or :meth:`~DisplayList.getTextPage`.
+
+
+================================= ============================================
+**Method**                        **Short Description**
+================================= ============================================
+:meth:`~DisplayList.run`          Run a display list through a device.
+:meth:`~DisplayList.getPixmap`    generate a pixmap
+:meth:`~DisplayList.getTextPage`  generate a text page
+:attr:`~DisplayList.rect`         mediabox of the display list
+================================= ============================================
+
+
+**Class API**
+
+.. class:: DisplayList
+
+   .. method:: __init__(self, mediabox)
+
+      Create a new display list.
+
+      :arg mediabox: The page's rectangle.
+      :type mediabox: :ref:`Rect`
+
+      :rtype: *DisplayList*
+
+   .. method:: run(device, matrix, area)
+    
+      Run the display list through a device. The device will populate the display list with its "commands" (i.e. text extraction or image creation). The display list can later be used to "read" a page many times without having to re-interpret it from the document file.
+
+      You will most probably instead use one of the specialized run methods below -- :meth:`getPixmap` or :meth:`getTextPage`.
+
+      :arg device: Device
+      :type device: :ref:`Device`
+
+      :arg matrix: Transformation matrix to apply to the display list contents.
+      :type matrix: :ref:`Matrix`
+
+      :arg area: Only the part visible within this area will be considered when the list is run through the device.
+      :type area: :ref:`Rect`
+
+   .. index::
+      pair: matrix; getPixmap
+      pair: colorspace; getPixmap
+      pair: clip; getPixmap
+      pair: alpha; getPixmap
+
+   .. method:: getPixmap(matrix=fitz.Identity, colorspace=fitz.csRGB, alpha=0, clip=None)
+
+      Run the display list through a draw device and return a pixmap.
+
+      :arg matrix: matrix to use. Default is the identity matrix.
+      :type matrix: :ref:`Matrix`
+
+      :arg colorspace: the desired colorspace. Default is RGB.
+      :type colorspace: :ref:`Colorspace`
+
+      :arg int alpha: determine whether or not (0, default) to include a transparency channel.
+
+      :arg clip: an area of the full mediabox to which the pixmap should be restricted.
+      :type clip: :ref:`IRect` or :ref:`Rect`
+
+      :rtype: :ref:`Pixmap`
+      :returns: pixmap of the display list.
+
+   .. method:: getTextPage(flags)
+
+      Run the display list through a text device and return a text page.
+
+      :arg int flags: control which information is parsed into a text page. Default value in PyMuPDF is **3 = TEXT_PRESERVE_LIGATURES | TEXT_PRESERVE_WHITESPACE**, i.e. ligatures are **passed through**, white spaces are **passed through** (not translated to spaces), and images are **not included**. See :ref:`TextPreserve`.
+
+      :rtype: :ref:`TextPage`
+      :returns: text page of the display list.
+
+   .. attribute:: rect
+
+      Contains the display list's mediabox. This will equal the page's rectangle if it was created via :meth:`Page.getDisplayList`.
+
+      :type: :ref:`Rect`
diff --git a/docs/document.rst b/docs/document.rst

new file mode 100644 (file)

index 0000000..8167205
--- /dev/null
+++ b/docs/document.rst
@@ -0,0 +1,1120 @@
+.. _Document:
+
+================
+Document
+================
+
+.. highlight:: python
+
+This class represents a document. It can be constructed from a file or from memory.
+
+Since version 1.9.0 there exists the alias *open* for this class, i.e. ``fitz.Document(...)`` and ``fitz.open(...)`` do exactly the same thing.
+
+For details on **embedded files** refer to Appendix 3.
+
+.. note::
+
+  Starting with v1.17.0, a new page addressing mechanism for **EPUB files only** is supported. This document type is internally organized in chapters such that pages can most efficiently be found by their so-called "location". The location is a tuple *(chapter, pno)* consisting of the chapter number and the page number **in that chapter**. Both numbers are zero-based.
+
+  While it is still possible to locate a page via its (absoute) number, doing so may mean that the complete document has to be layouted before the page can be addressed. This may have a significant performance implication if the document is very large. Due to internal EPUB file structures, using the page's **location** *(chapter, pno)* prevents this from happening.
+
+  To maintain a consistent API, PyMuPDF supports page *location* syntax for **all file types** -- documents without this feature simply have just one chapter. :meth:`Document.loadPage` and the equivalent index access now also support using the page *location*. There are a number of methods to convert between page numbers and locations, determine the chapter count, the page count per chapter, to compute the next and previous locations, and the last page location of a document.
+
+======================================= ==========================================================
+**Method / Attribute**                  **Short Description**
+======================================= ==========================================================
+:meth:`Document.authenticate`           gain access to an encrypted document
+:meth:`Document.can_save_incrementally` check if incremental save is possible
+:meth:`Document.chapterPageCount`       number of pages in chapter
+:meth:`Document.close`                  close the document
+:meth:`Document.convertToPDF`           write a PDF version to memory
+:meth:`Document.copyPage`               PDF only: copy a page reference
+:meth:`Document.deletePage`             PDF only: delete a page
+:meth:`Document.deletePageRange`        PDF only: delete a page range
+:meth:`Document.embeddedFileAdd`        PDF only: add a new embedded file from buffer
+:meth:`Document.embeddedFileCount`      PDF only: number of embedded files
+:meth:`Document.embeddedFileDel`        PDF only: delete an embedded file entry
+:meth:`Document.embeddedFileGet`        PDF only: extract an embedded file buffer
+:meth:`Document.embeddedFileInfo`       PDF only: metadata of an embedded file
+:meth:`Document.embeddedFileNames`      PDF only: list of embedded files
+:meth:`Document.embeddedFileUpd`        PDF only: change an embedded file
+:meth:`Document.findBookmark`           retrieve page location after layouting
+:meth:`Document.fullcopyPage`           PDF only: duplicate a page
+:meth:`Document.getPageFontList`        PDF only: make a list of fonts on a page
+:meth:`Document.getPageImageList`       PDF only: make a list of images on a page
+:meth:`Document.getPagePixmap`          create a pixmap of a page by page number
+:meth:`Document.getPageText`            extract the text of a page by page number
+:meth:`Document.getPageXObjectList`     PDF only: make a list of XObjects on a page
+:meth:`Document.getSigFlags`            PDF only: determine signature state
+:meth:`Document.getToC`                 create a table of contents
+:meth:`Document.insertPage`             PDF only: insert a new page
+:meth:`Document.insertPDF`              PDF only: insert pages from another PDF
+:meth:`Document.layout`                 re-paginate the document (if supported)
+:meth:`Document.loadPage`               read a page
+:meth:`Document.makeBookmark`           create a page pointer in reflowable documents
+:meth:`Document.metadataXML`            PDF only: :data:`xref` of XML metadata
+:meth:`Document.movePage`               PDF only: move a page to different location in doc
+:meth:`Document.need_appearances`       PDF only: get/set */NeedAppearances* property
+:meth:`Document.newPage`                PDF only: insert a new empty page
+:meth:`Document.nextLocation`           return (chapter, pno) of following page
+:meth:`Document.pages`                  iterator over a page range
+:meth:`Document.PDFCatalog`             PDF only: :data:`xref` of catalog (root)
+:meth:`Document.PDFTrailer`             PDF only: trailer source
+:meth:`Document.previousLocation`       return (chapter, pno) of preceeding page
+:meth:`Document.reload_page`            PDF only: provide a new copy of a page
+:meth:`Document.save`                   PDF only: save the document
+:meth:`Document.saveIncr`               PDF only: save the document incrementally
+:meth:`Document.scrub`                  PDF only: remove sensitive data
+:meth:`Document.searchPageFor`          search for a string on a page
+:meth:`Document.select`                 PDF only: select a subset of pages
+:meth:`Document.setMetadata`            PDF only: set the metadata
+:meth:`Document.setToC`                 PDF only: set the table of contents (TOC)
+:meth:`Document.updateObject`           PDF only: replace object source
+:meth:`Document.updateStream`           PDF only: replace stream source
+:meth:`Document.write`                  PDF only: writes document to memory
+:meth:`Document.xrefObject`             PDF only: object source at the :data:`xref`
+:meth:`Document.xrefStream`             PDF only: decompressed stream source at :data:`xref`
+:meth:`Document.xrefStreamRaw`          PDF only: raw stream source at :data:`xref`
+:attr:`Document.chapterCount`           number of chapters
+:attr:`Document.FormFonts`              PDF only: list of global widget fonts
+:attr:`Document.isClosed`               has document been closed?
+:attr:`Document.isDirty`                PDF only: has document been changed yet?
+:attr:`Document.isEncrypted`            document (still) encrypted?
+:attr:`Document.isFormPDF`              is this a Form PDF?
+:attr:`Document.isPDF`                  is this a PDF?
+:attr:`Document.isReflowable`           is this a reflowable document?
+:attr:`Document.lastLocation`           (chapter, pno) of last page
+:attr:`Document.metadata`               metadata
+:attr:`Document.name`                   filename of document
+:attr:`Document.needsPass`              require password to access data?
+:attr:`Document.outline`                first `Outline` item
+:attr:`Document.pageCount`              number of pages
+:attr:`Document.permissions`            permissions to access the document
+======================================= ==========================================================
+
+**Class API**
+
+.. class:: Document
+
+    .. index::
+       pair: filename; open
+       pair: stream; open
+       pair: filetype; open
+       pair: rect; open
+       pair: width; open
+       pair: height; open
+       pair: fontsize; open
+       pair: open; Document
+       pair: filename; Document
+       pair: stream; Document
+       pair: filetype; Document
+       pair: rect; Document
+       pair: fontsize; Document
+
+    .. method:: __init__(self, filename=None, stream=None, filetype=None, rect=None, width=0, height=0, fontsize=11)
+
+      Creates a *Document* object.
+
+      * With default parameters, a **new empty PDF** document will be created.
+      * If *stream* is given, then the document is created from memory and either *filename* or *filetype* must indicate its type.
+      * If *stream* is *None*, then a document is created from the file given by *filename*. Its type is inferred from the extension, which can be overruled by specifying *filetype*.
+
+      :arg str,pathlib filename: A UTF-8 string or *pathlib* object containing a file path (or a file type, see below).
+
+      :arg bytes,bytearray,BytesIO stream: A memory area containing a supported document. Its type **must** be specified by either *filename* or *filetype*.
+
+         *(Changed in version 1.14.13)* *io.BytesIO* is now also supported.
+
+      :arg str filetype: A string specifying the type of document. This may be something looking like a filename (e.g. "x.pdf"), in which case MuPDF uses the extension to determine the type, or a mime type like *application/pdf*. Just using strings like "pdf" will also work.
+
+      :arg rect_like rect: a rectangle specifying the desired page size. This parameter is only meaningful for documents with a variable page layout ("reflowable" documents), like e-books or HTML, and ignored otherwise. If specified, it must be a non-empty, finite rectangle with top-left coordinates (0, 0). Together with parameter *fontsize*, each page will be accordingly laid out and hence also determine the number of pages.
+
+      :arg float width: may used together with *height* as an alternative to *rect* to specify layout information.
+
+      :arg float height: may used together with *width* as an alternative to *rect* to specify layout information.
+
+      :arg float fontsize: the default fontsize for reflowable document types. This parameter is ignored if none of the parameters *rect* or *width* and *height* are specified. Will be used to calculate the page layout.
+
+      Overview of possible forms (using the *open* synonym of *Document*)::
+
+          >>> # from a file
+          >>> doc = fitz.open("some.pdf")
+          >>> doc = fitz.open("some.file", None, "pdf")  # copes with wrong extension
+          >>> doc = fitz.open("some.file", filetype="pdf")  # copes with wrong extension
+          >>> 
+          >>> # from memory
+          >>> doc = fitz.open("pdf", mem_area)
+          >>> doc = fitz.open(None, mem_area, "pdf")
+          >>> doc = fitz.open(stream=mem_area, filetype="pdf")
+          >>> 
+          >>> # new empty PDF
+          >>> doc = fitz.open()
+          >>> 
+
+    .. method:: authenticate(password)
+
+      Decrypts the document with the string *password*. If successful, document data can be accessed. For PDF documents, the "owner" and the "user" have different priviledges, and hence different passwords may exist for these authorization levels. The method will automatically establish the appropriate access rights for the provided password.
+
+      :arg str password: owner or user password.
+
+      :rtype: int
+      :returns: a positive value if successful, zero otherwise. If successful, the indicator *isEncrypted* is set to *False*. Positive return codes carry the following information detail:
+
+        * bit 0 set => no password required -- happens if method was used although :meth:`needsPass` was zero.
+        * bit 1 set => **user** password authenticated
+        * bit 2 set => **owner** password authenticated
+
+
+    .. method:: makeBookmark(loc)
+
+      *(New in v.1.17.3)* Return a page pointer in a reflowable document. After re-layouting the document, the result of this method can be used to find the new location of the page.
+
+      .. note:: Do not confuse with items of a table of contents, TOC.
+
+      :arg list,tuple loc: page location. Must be a valid *(chapter, pno)*.
+
+      :rtype: pointer
+      :returns: a long integer in pointer format. To be used for finding the new location of the page after re-layouting the document. Do not touch or re-assign.
+
+
+    .. method:: findBookmark(bookmark)
+
+      *(New in v.1.17.3)* Return the new page location after re-layouting the document.
+
+      :arg pointer bookmark: created by :meth:`Document.makeBookmark`.
+
+      :rtype: tuple
+      :returns: the new (chapter, pno) of the page.
+
+
+    .. method:: chapterPageCount(chapter)
+
+      *(New in v.1.17.0)* Return the number of pages of a chapter.
+
+      :arg int chapter: the 0-based chapter number.
+
+      :rtype: int
+      :returns: number of pages in chapter. Relevant only for document types whith chapter support (EPUB currently).
+
+
+    .. method:: nextLocation(page_id)
+
+      *(New in v.1.17.0)* Return the location of the following page.
+
+      :arg tuple page_id: the current page id. This must be a tuple *(chapter, pno)* identifying an existing page.
+
+      :returns: The tuple of the following page, i.e. either *(chapter, pno + 1)* or *(chapter + 1, 0)*, **or** the empty tuple *()* if the argument was the last page. Relevant only for document types whith chapter support (EPUB currently).
+
+
+    .. method:: previousLocation(page_id)
+
+      *(New in v.1.17.0)* Return the locator of the preceeding page.
+
+      :arg tuple page_id: the current page id. This must be a tuple *(chapter, pno)* identifying an existing page.
+
+      :returns: The tuple of the preceeding page, i.e. either *(chapter, pno - 1)* or the last page of the receeding chapter, **or** the empty tuple *()* if the argument was the first page. Relevant only for document types whith chapter support (EPUB currently).
+
+
+    .. method:: loadPage(page_id=0)
+
+      Create a :ref:`Page` object for further processing (like rendering, text searching, etc.).
+
+      *(Changed in v1.17.0)* For document types supporting a so-called "chapter structure" (like EPUB), pages can also be loaded via the combination of chapter number and relative page number, instead of the absolute page number. This should **significantly speed up access** for large documents.
+
+      :arg int,tuple page_id: *(Changed in v1.17.0)*
+      
+          Either a 0-based page number, or a tuple *(chapter, pno)*. For an **integer**, any *-inf < page_id < pageCount* is acceptable. While page_id is negative, :attr:`pageCount` will be added to it. For example: to load the last page, you can use *doc.loadPage(-1)*. After this you have page.number = doc.pageCount - 1.
+      
+          For a tuple, *chapter* must be in range :attr:`Document.chapterCount`, and *pno* must be in range :meth:`Document.chapterPageCount` of that chapter. Both values are 0-based. Using this notation, :attr:`Page.number` will equal the given tuple. Relevant only for document types whith chapter support (EPUB currently).
+
+      :rtype: :ref:`Page`
+
+    .. note::
+    
+       Documents also follow the Python sequence protocol with page numbers as indices: *doc.loadPage(n) == doc[n]*.
+       
+       For **absolute page numbers** only, expressions like *"for page in doc: ..."* and *"for page in reversed(doc): ..."* will successively yield the document's pages. Refer to :meth:`Document.pages` which allows processing pages as with slicing.
+
+       You can also use index notation with the new chapter-based page identification: use *page = doc[(5, 2)]* to load the third page of the sixth chapter.
+
+       To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapterCount` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements.
+
+    .. method:: reload_page(page)
+
+      *(New in version 1.16.10)*
+  
+      PDF only: Provide a new copy of a page after finishing and updating all pending changes.
+
+      :arg page: page object.
+      :type page: :ref:`Page`
+
+      :rtype: :ref:`Page`
+
+      :returns: a new copy of the same page. All pending updates (e.g. to annotations or widgets) will be finalized and a fresh copy of the page will be loaded.
+        .. note:: In a typical use case, a page :ref:`Pixmap` should be taken after annotations / widgets have been added or changed. To force all those changes being reflected in the page structure, this method re-instates a fresh copy while keeping the object hierarchy "document -> page -> annotation(s)" intact.
+
+
+    .. method:: pages(start=None, [stop=None, [step=None]])
+
+      *(New in version 1.16.4)*
+      
+      A generator for a given range of pages. Parameters have the same meaning as in the built-in function *range()*. Intended for expressions of the form *"for page in doc.pages(start, stop, step): ..."*.
+
+      :arg int start: start iteration with this page number. Default is zero, allowed values are -inf < start < pageCount. While this is negative, :attr:`pageCount` is added **before** starting the iteration.
+      :arg int stop: stop iteration at this page number. Default is :attr:`pageCount`, possible are -inf < stop <= pageCount. Larger values are **silently replaced** by the default. Negative values will cyclically emit the pages in reversed order. As with the built-in *range()*, this is the first page **not** returned.
+      :arg int step: stepping value. Defaults are 1 if start < stop and -1 if start > stop. Zero is not allowed.
+
+      :returns: a generator iterator over the document's pages. Some examples:
+
+          * "doc.pages()" emits all pages.
+          * "doc.pages(4, 9, 2)" emits pages 4, 6, 8.
+          * "doc.pages(0, None, 2)" emits all pages with even numbers.
+          * "doc.pages(-2)" emits the last two pages.
+          * "doc.pages(-1, -1)" emits all pages in reversed order.
+          * "doc.pages(-1, -10)" emits pages in reversed order, starting with the last page **repeatedly**. For a 4-page document the following page numbers are emitted: 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3.
+
+    .. index::
+       pair: from_page; convertToPDF (Document method)
+       pair: to_page; convertToPDF (Document method)
+       pair: rotate; convertToPDF (Document method)
+
+    .. method:: convertToPDF(from_page=-1, to_page=-1, rotate=0)
+
+      Create a PDF version of the current document and write it to memory. **All document types** (except PDF) are supported. The parameters have the same meaning as in :meth:`insertPDF`. In essence, you can restrict the conversion to a page subset, specify page rotation, and revert page sequence.
+
+      :arg int from_page: first page to copy (0-based). Default is first page.
+
+      :arg int to_page: last page to copy (0-based). Default is last page.
+
+      :arg int rotate: rotation angle. Default is 0 (no rotation). Should be *n * 90* with an integer n (not checked).
+
+      :rtype: bytes
+      :returns: a Python *bytes* object containing a PDF file image. It is created by internally using *write(garbage=4, deflate=True)*. See :meth:`write`. You can output it directly to disk or open it as a PDF. Here are some examples::
+
+          >>> # convert an XPS file to PDF
+          >>> xps = fitz.open("some.xps")
+          >>> pdfbytes = xps.convertToPDF()
+          >>>
+          >>> # either do this --->
+          >>> pdf = fitz.open("pdf", pdfbytes)
+          >>> pdf.save("some.pdf")
+          >>>
+          >>> # or this --->
+          >>> pdfout = open("some.pdf", "wb")
+          >>> pdfout.write(pdfbytes)
+          >>> pdfout.close()
+
+          >>> # copy image files to PDF pages
+          >>> # each page will have image dimensions
+          >>> doc = fitz.open()                     # new PDF
+          >>> imglist = [ ... image file names ...] # e.g. a directory listing
+          >>> for img in imglist:
+                  imgdoc=fitz.open(img)           # open image as a document
+                  pdfbytes=imgdoc.convertToPDF()  # make a 1-page PDF of it
+                  imgpdf=fitz.open("pdf", pdfbytes)
+                  doc.insertPDF(imgpdf)             # insert the image PDF
+          >>> doc.save("allmyimages.pdf")
+
+      .. note:: The method uses the same logic as the *mutool convert* CLI. This works very well in most cases -- however, beware of the following limitations.
+
+        * Image files: perfect, no issues detected. Apparently however, image transparency is ignored. If you need that (like for a watermark), use :meth:`Page.insertImage` instead. Otherwise, this method is recommended for its much better prformance.
+        * XPS: appearance very good. Links work fine, outlines (bookmarks) are lost, but can easily be recovered [#f2]_.
+        * EPUB, CBZ, FB2: similar to XPS.
+        * SVG: medium. Roughly comparable to `svglib <https://github.com/deeplook/svglib>`_.
+
+    .. method:: getToC(simple=True)
+
+      Creates a table of contents out of the document's outline chain.
+
+      :arg bool simple: Indicates whether a simple or a detailed ToC is required. If *simple == False*, each entry of the list also contains a dictionary with :ref:`linkDest` details for each outline entry.
+
+      :rtype: list
+
+      :returns: a list of lists. Each entry has the form *[lvl, title, page, dest]*. Its entries have the following meanings:
+
+        * *lvl* -- hierarchy level (positive *int*). The first entry is always 1. Entries in a row are either **equal**, **increase** by 1, or **decrease** by any number.
+        * *title* -- title (*str*)
+        * *page* -- 1-based page number (*int*). Page numbers *< 1* either indicate a target outside this document or no target at all (see next entry).
+        * *dest* -- (*dict*) included only if *simple=False*. Contains details of the link destination.
+
+    .. method:: getPagePixmap(pno, *args, **kwargs)
+
+      Creates a pixmap from page *pno* (zero-based). Invokes :meth:`Page.getPixmap`.
+
+      :arg int pno: page number, 0-based in -inf < pno < pageCount.
+
+      :rtype: :ref:`Pixmap`
+
+    .. method:: getPageXObjectList(pno)
+
+      PDF only: *(New in v1.16.13)* Return a list of all XObjects referenced by a page.
+
+      :arg int pno: page number, 0-based, *-inf < pno < pageCount*.
+
+      :rtype: list
+      :returns: a list of (non-image) XObjects. These objects typically represent pages *embedded* (not copied) from other PDFs. For example, meth:`Page.showPDFpage` will create this type of object. An item of this list has the following layout: **(xref, name, invoker, bbox)**, where
+
+        * **xref** (*int*) is the XObject's :data:`xref`
+        * **name** (*str*) is the symbolic name to reference the XObject
+        * **invoker** (*int*) the :data:`xref` of the invoking XObject or zero if the page directly invokes it
+        * **bbox** (*tuple*) the boundary box of the XObject's location on the page **in untransformed coordinates**. To get actual, non-rotated page coordinates, multiply with the page's transformation matrix :meth:`Page.getTransformation`.
+
+
+    .. method:: getPageImageList(pno, full=False)
+
+      PDF only: Return a list of all image descriptions referenced by a page.
+
+      :arg int pno: page number, 0-based, *-inf < pno < pageCount*.
+      :arg bool full: whether to also include the invoker's :data:`xref` (which is zero if this is the page).
+
+      :rtype: list
+
+      :returns: a list of images shown on this page. Each item looks like
+      
+      **(xref, smask, width, height, bpc, colorspace, alt. colorspace, name, filter, invoker)**
+      
+      Where
+
+        * **xref** (*int*) is the image object number
+        * **smask** (*int*) is the object number of its soft-mask image
+        * **width** and **height** (*ints*) are the image dimensions
+        * **bpc** (*int*) denotes the number of bits per component (normally 8)
+        * **colorspace** (*str*) a string naming the colorspace (like **DeviceRGB**)
+        * **alt. colorspace** (*str*) is any alternate colorspace depending on the value of **colorspace**
+        * **name** (*str*) is the symbolic name by which the image is referenced
+        * **filter** (*str*) is the decode filter of the image (:ref:`AdobeManual`, pp. 65).
+        * **invoker** (*int*) the :data:`xref` of the invoker. Zero if directly referenced by the page. Only present if *full=True*.
+
+      See below how this information can be used to extract PDF images as separate files. Another demonstration::
+
+        >>> doc = fitz.open("pymupdf.pdf")
+        >>> doc.getPageImageList(0, full=True)
+        [[316, 0, 261, 115, 8, 'DeviceRGB', '', 'Im1', 'DCTDecode', 0]]
+        >>> pix = fitz.Pixmap(doc, 316)  # 316 is the xref of the image
+        >>> pix
+        fitz.Pixmap(DeviceRGB, fitz.IRect(0, 0, 261, 115), 0)
+
+    .. method:: getPageFontList(pno, full=False)
+
+      PDF only: Return a list of all fonts referenced by the page.
+
+      :arg int pno: page number, 0-based, -inf < pno < pageCount.
+      :arg bool full: whether to also include the invoker's :data:`xref` (which is zero if directly referenced by the page).
+
+      :rtype: list
+
+      :returns: a list of fonts referenced by this page. Each entry looks like
+        
+      **(xref, ext, type, basefont, name, encoding, invoker)**,
+        
+      where
+
+          * **xref** (*int*) is the font object number (may be zero if the PDF uses one of the builtin fonts directly)
+          * **ext** (*str*) font file extension (e.g. "ttf", see :ref:`FontExtensions`)
+          * **type** (*str*) is the font type (like "Type1" or "TrueType" etc.)
+          * **basefont** (*str*) is the base font name,
+          * **name** (*str*) is the symbolic name, by which the font is referenced
+          * **encoding** (*str*) the font's character encoding if different from its built-in encoding (:ref:`AdobeManual`, p. 414):
+          * **invoker** (*int* optional) the :data:`xref` of the invoker. Zero if directly referenced by the page. Only present if *full=True*.
+
+      Example::
+
+          >>> doc = fitz.open("some.pdf")
+          >>> for f in doc.getPageFontList(0, full=False): print(f)
+          [24, 'ttf', 'TrueType', 'DOKBTG+Calibri', 'R10', '']
+          [17, 'ttf', 'TrueType', 'NZNDCL+CourierNewPSMT', 'R14', '']
+          [32, 'ttf', 'TrueType', 'FNUUTH+Calibri-Bold', 'R8', '']
+          [28, 'ttf', 'TrueType', 'NOHSJV+Calibri-Light', 'R12', '']
+          [8, 'ttf', 'Type0', 'ECPLRU+Calibri', 'R23', 'Identity-H']
+
+      .. note:: This list has no duplicate entries: the combination of :data:`xref` and *name* is unique. But by themselves, each of the two may occur multiple times. Duplicate *name* entries indicate the presence of "Form XObjects" on the page, e.g. generated by :meth:`Page.showPDFpage`.
+
+    .. method:: getPageText(pno, output="text")
+
+      Extracts the text of a page given its page number *pno* (zero-based). Invokes :meth:`Page.getText`.
+
+      :arg int pno: page number, 0-based, any value *-inf < pno < pageCount*.
+
+      :arg str output: A string specifying the requested output format: text, html, json or xml. Default is *text*.
+
+      :rtype: str
+
+    .. index::
+       pair: fontsize; layout (Document method)
+       pair: rect; layout (Document method)
+       pair: width; layout (Document method)
+       pair: height; layout (Document method)
+
+    .. method:: layout(rect=None, width=0, height=0, fontsize=11)
+
+      Re-paginate ("reflow") the document based on the given page dimension and fontsize. This only affects some document types like e-books and HTML. Ignored if not supported. Supported documents have *True* in property :attr:`isReflowable`.
+
+      :arg rect_like rect: desired page size. Must be finite, not empty and start at point (0, 0).
+      :arg float width: use it together with *height* as alternative to *rect*.
+      :arg float height: use it together with *width* as alternative to *rect*.
+      :arg float fontsize: the desired default fontsize.
+
+    .. method:: select(s)
+
+      PDF only: Keeps only those pages of the document whose numbers occur in the list. Empty sequences or elements outside *range(len(doc))* will cause a *ValueError*. For more details see remarks at the bottom or this chapter.
+
+      :arg sequence s: The sequence (see :ref:`SequenceTypes`) of page numbers (zero-based) to be included. Pages not in the sequence will be deleted (from memory) and become unavailable until the document is reopened. **Page numbers can occur multiple times and in any order:** the resulting document will reflect the sequence exactly as specified.
+
+      .. note::
+
+          * Page numbers in the sequence need not be unique nor be in any particular order. This makes the method a versatile utility to e.g. select only the even or the odd pages or meeting some other criteria and so forth.
+
+          * On a technical level, the method will always create a new :data:`pagetree`.
+
+          * When dealing with only a few pages, methods :meth:`copyPage`, :meth:`movePage`, :meth:`deletePage` are easier to use. In fact, they are also **much faster** -- by at least one order of magnitude when the document has many pages.
+
+
+    .. method:: setMetadata(m)
+
+      PDF only: Sets or updates the metadata of the document as specified in *m*, a Python dictionary. As with :meth:`select`, these changes become permanent only when you save the document. Incremental save is supported.
+
+      :arg dict m: A dictionary with the same keys as *metadata* (see below). All keys are optional. A PDF's format and encryption method cannot be set or changed and will be ignored. If any value should not contain data, do not specify its key or set the value to *None*. If you use *{}* all metadata information will be cleared to the string *"none"*. If you want to selectively change only some values, modify a copy of *doc.metadata* and use it as the argument. Arbitrary unicode values are possible if specified as UTF-8-encoded.
+
+    .. method:: setToC(toc, collapse=1)
+
+      PDF only: Replaces the **complete current outline** tree (table of contents) with the new one provided as the argument. After successful execution, the new outline tree can be accessed as usual via method *getToC()* or via property *outline*. Like with other output-oriented methods, changes become permanent only via *save()* (incremental save supported). Internally, this method consists of the following two steps. For a demonstration see example below.
+
+      - Step 1 deletes all existing bookmarks.
+
+      - Step 2 creates a new TOC from the entries contained in *toc*.
+
+      :arg sequence toc:
+
+          A Python sequence (list or tuple) with **all bookmark entries** that should form the new table of contents. Output variants of :meth:`getToC` are acceptable. To completely remove the table of contents specify an empty sequence or None. Each item must be a list with the following format.
+
+          * [lvl, title, page [, dest]] where
+
+            - **lvl** is the hierarchy level (int > 0) of the item, which **must be 1** for the first item and at most 1 larger than the previous one.
+
+            - **title** (str) is the title to be displayed. It is assumed to be UTF-8-encoded (relevant for multibyte code points only).
+
+            - **page** (int) is the target page number **(attention: 1-based)**. Must be in valid range if positive. Set it to -1 if there is no target, or the target is external.
+
+            - **dest** (optional) is a dictionary or a number. If a number, it will be interpreted as the desired height (in points) this entry should point to on the page. Use a dictionary (like the one given as output by *getToC(False)*) if you want to store destinations that are either "named", or reside outside this document (other files, internet resources, etc.).
+
+      :arg int collapse: *(new in version 1.16.9)* controls the hierarchy level beyond which outline entries should initially show up collapsed. The default 1 will hence only display level 1, higher levels must be expanded in the PDF viewer. To completely expand specify either a large integer, 0 or None.
+
+      :rtype: int
+      :returns: the number of inserted, resp. deleted items.
+
+
+    .. method:: can_save_incrementally()
+
+      *(New in version 1.16.0)*
+      
+      Check whether the document can be saved incrementally. Use it to choose the right option without encountering exceptions.
+
+    .. method:: scrub(attached_files=True, clean_pages=True, embedded_files=True, hidden_text=True, javascript=True, metadata=True, redactions=True, remove_links=True, reset_fields=True, reset_responses=True, xml_metadata=True)
+
+      PDF only: *(New in v1.16.14)* Remove potentially sensitive data from the PDF. This function is inspired by the similar "Sanitize" function in Adobe Acrobat products. The process is configurable by a number of options, which are all *True* by default.
+
+      :arg bool attached_files: Search for 'FileAttachment' annotations and remove the file content.
+      :arg bool clean_pages: Remove any comments from page painting sources. If this option is set to *False*, then this is also done for *hidden_text* and *redactions*.
+      :arg bool embedded_files: Remove embedded files.
+      :arg bool hidden_text: Remove OCR-ed text and invisible text.
+      :arg bool javascript: Remove JavaScript sources.
+      :arg bool metadata: Remove PDF standard metadata.
+      :arg bool redactions: Apply redaction annotations.
+      :arg bool remove_links: Remove all links.
+      :arg bool reset_fields: Reset all form fields to their defaults.
+      :arg bool reset_responses: Remove all responses from all annotations.
+      :arg bool xml_metadata: Remove XML metadata.
+
+
+    .. method:: save(outfile, garbage=0, clean=False, deflate=False, incremental=False, ascii=False, expand=0, linear=False, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None)
+
+      PDF only: Saves the document in its **current state**.
+
+      :arg str outfile: The file path to save to. Must be different from the original value if "incremental" is false or zero. When saving incrementally, "garbage" and "linear" **must be** false or zero and this parameter **must equal** the original filename (for convenience use *doc.name*).
+
+      :arg int garbage: Do garbage collection. Positive values exclude "incremental".
+
+       * 0 = none
+       * 1 = remove unused objects
+       * 2 = in addition to 1, compact the :data:`xref` table
+       * 3 = in addition to 2, merge duplicate objects
+       * 4 = in addition to 3, check object streams for duplication (may be slow)
+
+      :arg bool clean: Clean and sanitize content streams [#f1]_. Corresponds to "mutool clean -sc".
+
+      :arg bool deflate: Deflate (compress) uncompressed streams.
+
+      :arg bool incremental: Only save changed objects. Excludes "garbage" and "linear". Cannot be used for files that are decrypted or repaired and also in some other cases. To be sure, check :meth:`Document.can_save_incrementally`. If this is false, saving to a new file is required.
+
+      :arg bool ascii: convert binary data to ASCII.
+
+      :arg int expand: Decompress objects. Generates versions that can be better read by some other programs and will lead to larger files.
+
+       * 0 = none
+       * 1 = images
+       * 2 = fonts
+       * 255 = all
+
+      :arg bool linear: Save a linearised version of the document. This option creates a file format for improved performance when read via internet connections. Excludes "incremental".
+
+      :arg bool pretty: Prettify the document source for better readability. PDF objects will be reformatted to look like the default output of :meth:`Document.xrefObject`.
+
+      :arg int permissions: *(new in version 1.16.0)* Set the desired permission levels. See :ref:`PermissionCodes` for possible values. Default is granting all.
+
+      :arg int encryption: *(new in version 1.16.0)* set the desired encryption method. See :ref:`EncryptionMethods` for possible values.
+
+      :arg str owner_pw: *(new in version 1.16.0)* set the document's owner password.
+
+      :arg str user_pw: *(new in version 1.16.0)* set the document's user password.
+
+    .. method:: saveIncr()
+
+      PDF only: saves the document incrementally. This is a convenience abbreviation for *doc.save(doc.name, incremental=True, encryption=PDF_ENCRYPT_KEEP)*.
+
+
+    .. method:: write(garbage=0, clean=False, deflate=False, ascii=False, expand=0, linear=False, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None)
+
+      PDF only: Writes the **current content of the document** to a bytes object instead of to a file. Obviously, you should be wary about memory requirements. The meanings of the parameters exactly equal those in :meth:`save`. Chater :ref:`FAQ` contains an example for using this method as a pre-processor to `pdfrw <https://pypi.python.org/pypi/pdfrw/0.3>`_.
+
+      *(Changed in version 1.16.0)* for extended encryption support.
+
+      :rtype: bytes
+      :returns: a bytes object containing the complete document.
+
+    .. method:: searchPageFor(pno, text, hit_max=16, quads=False)
+
+       Search for "text" on page number "pno". Works exactly like the corresponding :meth:`Page.searchFor`. Any integer -inf < pno < pageCount is acceptable.
+
+    .. index::
+       pair: from_page; insertPDF (Document method)
+       pair: to_page; insertPDF (Document method)
+       pair: start_at; insertPDF (Document method)
+       pair: rotate; insertPDF (Document method)
+       pair: links; insertPDF (Document method)
+       pair: annots; insertPDF (Document method)
+
+    .. method:: insertPDF(docsrc, from_page=-1, to_page=-1, start_at=-1, rotate=-1, links=True, annots=True)
+
+      PDF only: Copy the page range **[from_page, to_page]** (including both) of PDF document *docsrc* into the current one. Inserts will start with page number *start_at*. Negative values can be used to indicate default values. All pages thus copied will be rotated as specified. Links can be excluded in the target, see below. All page numbers are zero-based.
+
+      :arg docsrc: An opened PDF *Document* which must not be the current document object. However, it may refer to the same underlying file.
+      :type docsrc: *Document*
+
+      :arg int from_page: First page number in *docsrc*. Default is zero.
+
+      :arg int to_page: Last page number in *docsrc* to copy. Default is the last page.
+
+      :arg int start_at: First copied page will become page number *start_at* in the destination. If omitted, the page range will be appended to current document. If zero, the page range will be inserted before current first page.
+
+      :arg int rotate: All copied pages will be rotated by the provided value (degrees, integer multiple of 90).
+
+      :arg bool links: Choose whether (internal and external) links should be included in the copy. Default is *True*. An **internal link is always excluded**, if its destination is not one of the copied pages.
+      :arg bool annots: *(new in version 1.16.1)* choose whether annotations should be included in the copy.
+      
+    .. note::
+
+       1. If *from_page > to_page*, pages will be **copied in reverse order**. If *0 <= from_page == to_page*, then one page will be copied.
+
+       2. *docsrc* bookmarks **will not be copied**. It is easy however, to recover a table of contents for the resulting document. Look at the examples below and at program `PDFjoiner.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/PDFjoiner.py>`_ in the *examples* directory: it can join PDF documents and at the same time piece together respective parts of the tables of contents.
+
+    .. index::
+       pair: width; newPage (Document method)
+       pair: height; newPage (Document method)
+
+    .. method:: newPage(pno=-1, width=595, height=842)
+
+      PDF only: Insert an empty page.
+
+      :arg int pno: page number in front of which the new page should be inserted. Must be in *1 < pno <= pageCount*. Special values -1 and *len(doc)* insert **after** the last page.
+
+      :arg float width: page width.
+      :arg float height: page height.
+
+      :rtype: :ref:`Page`
+      :returns: the created page object.
+
+    .. index::
+       pair: fontsize; insertPage (Document method)
+       pair: width; insertPage (Document method)
+       pair: height; insertPage (Document method)
+       pair: fontname; insertPage (Document method)
+       pair: fontfile; insertPage (Document method)
+       pair: color; insertPage (Document method)
+
+    .. method:: insertPage(pno, text=None, fontsize=11, width=595, height=842, fontname="helv", fontfile=None, color=None)
+
+      PDF only: Insert a new page and insert some text. Convenience function which combines :meth:`Document.newPage` and (parts of) :meth:`Page.insertText`.
+
+      :arg int pno: page number (0-based) **in front of which** to insert. Must be in *range(-1, len(doc) + 1)*. Special values -1 and *len(doc)* insert **after** the last page.
+
+          Changed in version 1.14.12
+             This is now a positional parameter
+
+      For the other parameters, please consult the aforementioned methods.
+
+      :rtype: int
+      :returns: the result of :meth:`Page.insertText` (number of successfully inserted lines).
+
+    .. method:: deletePage(pno=-1)
+
+      PDF only: Delete a page given by its 0-based number in -inf < pno < pageCount - 1.
+
+      Changed in version 1.14.17
+
+      :arg int pno: the page to be deleted. Negative number count backwards from the end of the document (like with indices). Default is the last page.
+
+    .. method:: deletePageRange(from_page=-1, to_page=-1)
+
+      PDF only: Delete a range of pages given as 0-based numbers. Any *-1* parameter will first be replaced by *len(doc) - 1* (ie. last page number). After that, condition *0 <= from_page <= to_page < len(doc)* must be true. If the parameters are equal, this is equivalent to :meth:`deletePage`.
+
+      *(Changed in version 1.14.17)* Table of contents and internal links are now resynchronized.
+
+      :arg int from_page: the first page to be deleted.
+
+      :arg int to_page: the last page to be deleted.
+
+      .. note::
+
+        In an effort to maintain a valid PDF structure, this method and :meth:`deletePage` will also remove the deleted pages from the table of contents.
+
+        Similarly, it will **scan all pages** of the PDF and remove any links that point to deleted pages. This action may have an extended response time for documents with a lot of pages.
+
+        The **number of deleted pages** has a very small response time effect. Therefore, whenever possible, delete page **ranges** instead of single pages.
+
+        Example: Delete the page range 500 to 520 from a large PDF, using different methods.
+
+        Method 1 - *deletePageRange*::
+
+          import time, fitz
+          doc = fitz.open("Adobe PDF Reference 1-7.pdf")
+          t0=time.perf_counter();doc.deletePageRange(500, 520);t1=time.perf_counter()
+          round(t1 - t0, 2)
+          0.66
+
+
+        Method 2 - *select*, this is more than 10 times **slower**::
+
+          l = list(range(500)) + list(range(521, 1310))
+          t0=time.perf_counter();doc.select(l);t1=time.perf_counter()
+          round(t1 - t0, 2)
+          7.62
+
+
+    .. method:: copyPage(pno, to=-1)
+
+      PDF only: Copy a page reference within the document.
+
+      :arg int pno: the page to be copied. Must be in range *0 <= pno < len(doc)*.
+
+      :arg int to: the page number in front of which to copy. The default inserts **after** the last page.
+
+      .. note:: Only a new **reference** to the page object will be created -- not a new page object, all copied pages will have identical attribute values, including the :attr:`Page.xref`. This implies that any changes to one of these copies will appear on all of them.
+
+    .. method:: fullcopyPage(pno, to=-1)
+
+      *(New in version 1.14.17)*
+      
+      PDF only: Make a new copy (duplicate) of a page.
+
+      :arg int pno: the page to be duplicated. Must be in range *0 <= pno < len(doc)*.
+
+      :arg int to: the page number in front of which to copy. The default inserts **after** the last page.
+
+      .. note::
+      
+          * In contrast to :meth:`copyPage`, this method creates a new page object (with a new :data:`xref`), which can be changed independently from the original.
+
+          * Any Popup and "IRT" ("in response to") annotations are **not copied** to avoid potentially incorrect situations.
+
+    .. method:: movePage(pno, to=-1)
+
+      PDF only: Move (copy and then delete original) a page within the document.
+
+      :arg int pno: the page to be moved. Must be in range *0 <= pno < len(doc)*.
+
+      :arg int to: the page number in front of which to insert the moved page. The default moves **after** the last page.
+
+
+    .. method:: need_appearances(value=None)
+
+      *(New in v1.17.4)*
+
+      PDF only: Get or set the */NeedAppearances* property of Form PDFs. Quote: *"(Optional) A flag specifying whether to construct appearance streams and appearance dictionaries for all widget annotations in the document ... Default value: false."* This may help controlling the behavior of some readers / viewers.
+
+      :arg bool value: set the property to this value. If omitted or *None*, inquire the current value.
+
+      :rtype: bool
+      :returns:
+         * None: not a Form PDF or property not defined.
+         * True / False: the value of the property (either just set or existing for inquiries).
+
+         Once set, the property cannot be removed again (which is no problem).
+
+
+    .. method:: getSigFlags()
+
+      PDF only: Return whether the document contains signature fields. This is an optional PDF property: if not present (return value -1), no conclusions can be drawn -- the PDF creator may just not have bothered to use it.
+
+      :rtype: int
+      :returns:
+         * -1: not a Form PDF / no signature fields recorded / no *SigFlags* found.
+         * 1: at least one signature field exists.
+         * 3:  contains signatures that may be invalidated if the file is saved (written) in a way that alters its previous contents, as opposed to an incremental update.
+
+    .. index::
+       pair: filename; embeddedFileAdd (Document method)
+       pair: ufilename; embeddedFileAdd (Document method)
+       pair: desc; embeddedFileAdd (Document method)
+
+    .. method:: embeddedFileAdd(name, buffer, filename=None, ufilename=None, desc=None)
+
+      PDF only: Embed a new file. All string parameters except the name may be unicode (in previous versions, only ASCII worked correctly). File contents will be compressed (where beneficial).
+
+      Changed in version 1.14.16
+         The sequence of positional parameters "name" and "buffer" has been changed to comply with the layout of other functions.
+
+      :arg str name: entry identifier, must not already exist.
+      :arg bytes,bytearray,BytesIO buffer: file contents.
+
+         *(Changed in version 1.14.13)* *io.BytesIO* is now also supported.
+
+      :arg str filename: optional filename. Documentation only, will be set to *name* if *None*.
+      :arg str ufilename: optional unicode filename. Documentation only, will be set to *filename* if *None*.
+      :arg str desc: optional description. Documentation only, will be set to *name* if *None*.
+
+
+    .. method:: embeddedFileCount()
+
+      PDF only: Return the number of embedded files.
+
+         Changed in version 1.14.16
+            This is now a method. In previous versions, this was a property.
+
+    .. method:: embeddedFileGet(item)
+
+      PDF only: Retrieve the content of embedded file by its entry number or name. If the document is not a PDF, or entry cannot be found, an exception is raised.
+
+      :arg int,str item: index or name of entry. An integer must be in *range(embeddedFileCount())*.
+
+      :rtype: bytes
+
+    .. method:: embeddedFileDel(item)
+
+      PDF only: Remove an entry from `/EmbeddedFiles`. As always, physical deletion of the embedded file content (and file space regain) will occur only when the document is saved to a new file with a suitable garbage option.
+
+         Changed in version 1.14.16
+            Items can now be deleted by index, too.
+
+      :arg int/str item: index or name of entry.
+
+      .. warning:: When specifying an entry name, this function will only **delete the first item** with that name. Be aware that PDFs not created with PyMuPDF may contain duplicate names. So you may want to take appropriate precautions.
+
+    .. method:: embeddedFileInfo(item)
+
+      PDF only: Retrieve information of an embedded file given by its number or by its name.
+
+      :arg int/str item: index or name of entry. An integer must be in *range(embeddedFileCount())*.
+
+      :rtype: dict
+      :returns: a dictionary with the following keys:
+
+          * *name* -- (*str*) name under which this entry is stored
+          * *filename* -- (*str*) filename
+          * *ufilename* -- (*unicode*) filename
+          * *desc* -- (*str*) description
+          * *size* -- (*int*) original file size
+          * *length* -- (*int*) compressed file length
+
+    .. method:: embeddedFileNames()
+
+      *(New in version 1.14.16)*
+      
+      PDF only: Return a list of embedded file names. The sequence of names equals the physical sequence in the document.
+
+      :rtype: list
+
+    .. index::
+       pair: filename; embeddedFileUpd (Document method)
+       pair: ufilename; embeddedFileUpd (Document method)
+       pair: desc; embeddedFileUpd (Document method)
+
+    .. method:: embeddedFileUpd(item, buffer=None, filename=None, ufilename=None, desc=None)
+
+      PDF only: Change an embedded file given its entry number or name. All parameters are optional. Letting them default leads to a no-operation.
+
+      :arg int/str item: index or name of entry. An integer must be in *range(0, embeddedFileCount())*.
+      :arg bytes,bytearray,BytesIO buffer: the new file content.
+
+         *(Changed in version 1.14.13)* *io.BytesIO* is now also supported.
+
+      :arg str filename: the new filename.
+      :arg str ufilename: the new unicode filename.
+      :arg str desc: the new description.
+
+    .. method:: embeddedFileSetInfo(n, filename=None, ufilename=None, desc=None)
+
+      PDF only: Change embedded file meta information. All parameters are optional. Letting them default will lead to a no-operation.
+
+      :arg int,str n: index or name of entry. An integer must be in *range(embeddedFileCount())*.
+      :arg str filename: sets the filename.
+      :arg str ufilename: sets the unicode filename.
+      :arg str desc: sets the description.
+
+      .. note:: Deprecated subset of :meth:`embeddedFileUpd`. Will be deleted in a future version.
+
+    .. method:: close()
+
+      Release objects and space allocations associated with the document. If created from a file, also closes *filename* (releasing control to the OS).
+
+    .. method:: xrefObject(xref, compressed=False, ascii=False)
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Return the definition of a PDF object. For details please refer to :meth:`Document.xrefObject`.
+  
+    .. method:: PDFCatalog()
+      
+      *(New in version 1.16.8)*
+      
+      PDF only: Return the :data:`xref` of the PDF catalog (or root) object. For details please refer to :meth:`Document._getPDFroot`.
+
+
+    .. method:: PDFTrailer(compressed=False)
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. For details please refer to :meth:`Document._getTrailerString`.
+
+
+    .. method:: metadataXML()
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Return the :data:`xref` of the document's XML metadata. For details please refer to :meth:`Document._getXmlMetadataXref`.
+
+    .. method:: xrefStream(xref)
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Return the **decompressed** contents of the :data:`xref` stream object. For details please refer to :meth:`Document._getXrefStream`.
+
+    .. method:: xrefStreamRaw(xref)
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Return the **unmodified** contents of the :data:`xref` stream object. Otherwise equal to :meth:`Document.xrefStream`.
+ 
+    .. method:: updateObject(xref, obj_str, page=None)
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Update object at :data:`xref`. For details please refer to :meth:`Document._updateObject`.
+
+    .. method:: updateStream(xref, data, new=False)
+
+      *(New in version 1.16.8)*
+      
+      PDF only: Repleace the stream at :data`xref`. For details please refer to :meth:`Document._updateStream`.
+
+
+    .. attribute:: outline
+
+      Contains the first :ref:`Outline` entry of the document (or *None*). Can be used as a starting point to walk through all outline items. Accessing this property for encrypted, not authenticated documents will raise an *AttributeError*.
+
+      :type: :ref:`Outline`
+
+    .. attribute:: isClosed
+
+      *False* if document is still open. If closed, most other attributes and methods will have been deleted / disabled. In addition, :ref:`Page` objects referring to this document (i.e. created with :meth:`Document.loadPage`) and their dependent objects will no longer be usable. For reference purposes, :attr:`Document.name` still exists and will contain the filename of the original document (if applicable).
+
+      :type: bool
+
+    .. attribute:: isPDF
+
+      *True* if this is a PDF document, else *False*.
+
+      :type: bool
+
+    .. attribute:: isFormPDF
+
+      *False* if this is not a PDF or has no form fields, otherwise the number of root form fields (fields with no ancestors).
+
+      Changed in version 1.16.4 Returns the total number of (root) form fields.
+
+      :type: bool,int
+
+    .. attribute:: isReflowable
+
+      *True* if document has a variable page layout (like e-books or HTML). In this case you can set the desired page dimensions during document creation (open) or via method :meth:`layout`.
+
+      :type: bool
+
+    .. attribute:: needsPass
+
+      Indicates whether the document is password-protected against access. This indicator remains unchanged -- **even after the document has been authenticated**. Precludes incremental saves if true.
+
+      :type: bool
+
+    .. attribute:: isEncrypted
+
+      This indicator initially equals *needsPass*. After successful authentication, it is set to *False* to reflect the situation.
+
+      :type: bool
+
+    .. attribute:: permissions
+
+      Contains the permissions to access the document. This is an integer containing bool values in respective bit positions. For example, if *doc.permissions & fitz.PDF_PERM_MODIFY > 0*, you may change the document. See :ref:`PermissionCodes` for details.
+
+      Changed in version 1.16.0 This is now an integer comprised of bit indicators. Was a dictionary previously.
+
+      :type: int
+
+    .. attribute:: metadata
+
+      Contains the document's meta data as a Python dictionary or *None* (if *isEncrypted=True* and *needPass=True*). Keys are *format*, *encryption*, *title*, *author*, *subject*, *keywords*, *creator*, *producer*, *creationDate*, *modDate*. All item values are strings or *None*.
+
+      Except *format* and *encryption*, for PDF documents, the key names correspond in an obvious way to the PDF keys */Creator*, */Producer*, */CreationDate*, */ModDate*, */Title*, */Author*, */Subject*, and */Keywords* respectively.
+
+      - *format* contains the document format (e.g. 'PDF-1.6', 'XPS', 'EPUB').
+
+      - *encryption* either contains *None* (no encryption), or a string naming an encryption method (e.g. *'Standard V4 R4 128-bit RC4'*). Note that an encryption method may be specified **even if** *needsPass=False*. In such cases not all permissions will probably have been granted. Check :attr:`Document.permissions` for details.
+
+      - If the date fields contain valid data (which need not be the case at all!), they are strings in the PDF-specific timestamp format "D:<TS><TZ>", where
+
+          - <TS> is the 12 character ISO timestamp *YYYYMMDDhhmmss* (*YYYY* - year, *MM* - month, *DD* - day, *hh* - hour, *mm* - minute, *ss* - second), and
+
+          - <TZ> is a time zone value (time intervall relative to GMT) containing a sign ('+' or '-'), the hour (*hh*), and the minute (*'mm'*, note the apostrophies!).
+
+      - A Paraguayan value might hence look like *D:20150415131602-04'00'*, which corresponds to the timestamp April 15, 2015, at 1:16:02 pm local time Asuncion.
+
+      :type: dict
+
+    .. Attribute:: name
+
+      Contains the *filename* or *filetype* value with which *Document* was created.
+
+      :type: str
+
+    .. Attribute:: pageCount
+
+      Contains the number of pages of the document. May return 0 for documents with no pages. Function *len(doc)* will also deliver this result.
+
+      :type: int
+
+    .. Attribute:: chapterCount
+      
+      *(New in version 1.17.0)*
+      Contains the number of chapters in the document. Always at least 1. Relevant only for document types with chapter support (EPUB currently). Other documents will return 1.
+
+      :type: int
+
+    .. Attribute:: lastLocation
+
+      *(New in version 1.17.0)*
+      Contains (chapter, pno) of the document's last page. Relevant only for document types with chapter support (EPUB currently). Other documents will return *(0, len(doc) - 1)* and *(0, -1)* if it has no pages.
+
+      :type: int
+
+    .. Attribute:: FormFonts
+
+      A list of form field font names defined in the */AcroForm* object. *None* if not a PDF.
+
+      :type: list
+
+.. NOTE:: For methods that change the structure of a PDF (:meth:`insertPDF`, :meth:`select`, :meth:`copyPage`, :meth:`deletePage` and others), be aware that objects or properties in your program may have been invalidated or orphaned. Examples are :ref:`Page` objects and their children (links, annotations, widgets), variables holding old page counts, tables of content and the like. Remember to keep such variables up to date or delete orphaned objects. Also refer to :ref:`ReferenialIntegrity`.
+
+:meth:`setMetadata` Example
+-------------------------------
+Clear metadata information. If you do this out of privacy / data protection concerns, make sure you save the document as a new file with *garbage > 0*. Only then the old */Info* object will also be physically removed from the file. In this case, you may also want to clear any XML metadata inserted by several PDF editors:
+
+>>> import fitz
+>>> doc=fitz.open("pymupdf.pdf")
+>>> doc.metadata             # look at what we currently have
+{'producer': 'rst2pdf, reportlab', 'format': 'PDF 1.4', 'encryption': None, 'author':
+'Jorj X. McKie', 'modDate': "D:20160611145816-04'00'", 'keywords': 'PDF, XPS, EPUB, CBZ',
+'title': 'The PyMuPDF Documentation', 'creationDate': "D:20160611145816-04'00'",
+'creator': 'sphinx', 'subject': 'PyMuPDF 1.9.1'}
+>>> doc.setMetadata({})      # clear all fields
+>>> doc.metadata             # look again to show what happened
+{'producer': 'none', 'format': 'PDF 1.4', 'encryption': None, 'author': 'none',
+'modDate': 'none', 'keywords': 'none', 'title': 'none', 'creationDate': 'none',
+'creator': 'none', 'subject': 'none'}
+>>> doc._delXmlMetadata()    # clear any XML metadata
+>>> doc.save("anonymous.pdf", garbage = 4)       # save anonymized doc
+
+:meth:`setToC` Demonstration
+----------------------------------
+This shows how to modify or add a table of contents. Also have a look at `csv2toc.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/csv2toc.py>`_ and `toc2csv.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/toc2csv.py>`_ in the examples directory.
+
+>>> import fitz
+>>> doc = fitz.open("test.pdf")
+>>> toc = doc.getToC()
+>>> for t in toc: print(t)                           # show what we have
+[1, 'The PyMuPDF Documentation', 1]
+[2, 'Introduction', 1]
+[3, 'Note on the Name fitz', 1]
+[3, 'License', 1]
+>>> toc[1][1] += " modified by setToC"               # modify something
+>>> doc.setToC(toc)                                  # replace outline tree
+3                                                    # number of bookmarks inserted
+>>> for t in doc.getToC(): print(t)                  # demonstrate it worked
+[1, 'The PyMuPDF Documentation', 1]
+[2, 'Introduction modified by setToC', 1]            # <<< this has changed
+[3, 'Note on the Name fitz', 1]
+[3, 'License', 1]
+
+:meth:`insertPDF` Examples
+----------------------------
+**(1) Concatenate two documents including their TOCs:**
+
+>>> doc1 = fitz.open("file1.pdf")          # must be a PDF
+>>> doc2 = fitz.open("file2.pdf")          # must be a PDF
+>>> pages1 = len(doc1)                     # save doc1's page count
+>>> toc1 = doc1.getToC(False)     # save TOC 1
+>>> toc2 = doc2.getToC(False)     # save TOC 2
+>>> doc1.insertPDF(doc2)                   # doc2 at end of doc1
+>>> for t in toc2:                         # increase toc2 page numbers
+        t[2] += pages1                     # by old len(doc1)
+>>> doc1.setToC(toc1 + toc2)               # now result has total TOC
+
+Obviously, similar ways can be found in more general situations. Just make sure that hierarchy levels in a row do not increase by more than one. Inserting dummy bookmarks before and after *toc2* segments would heal such cases. A ready-to-use GUI (wxPython) solution can be found in script `PDFjoiner.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/PDFjoiner.py>`_ of the examples directory.
+
+**(2) More examples:**
+
+>>> # insert 5 pages of doc2, where its page 21 becomes page 15 in doc1
+>>> doc1.insertPDF(doc2, from_page=21, to_page=25, start_at=15)
+
+>>> # same example, but pages are rotated and copied in reverse order
+>>> doc1.insertPDF(doc2, from_page=25, to_page=21, start_at=15, rotate=90)
+
+>>> # put copied pages in front of doc1
+>>> doc1.insertPDF(doc2, from_page=21, to_page=25, start_at=0)
+
+Other Examples
+----------------
+**Extract all page-referenced images of a PDF into separate PNG files**::
+
+ for i in range(len(doc)):
+     imglist = doc.getPageImageList(i)
+     for img in imglist:
+         xref = img[0]                  # xref number
+         pix = fitz.Pixmap(doc, xref)   # make pixmap from image
+         if pix.n - pix.alpha < 4:      # can be saved as PNG
+             pix.writePNG("p%s-%s.png" % (i, xref))
+         else:                          # CMYK: must convert first
+             pix0 = fitz.Pixmap(fitz.csRGB, pix)
+             pix0.writePNG("p%s-%s.png" % (i, xref))
+             pix0 = None                # free Pixmap resources
+         pix = None                     # free Pixmap resources
+
+**Rotate all pages of a PDF:**
+
+>>> for page in doc: page.setRotation(90)
+
+.. rubric:: Footnotes
+
+.. [#f1] Content streams describe what (e.g. text or images) appears where and how on a page. PDF uses a specialized mini language similar to PostScript to do this (pp. 985 in :ref:`AdobeManual`), which gets interpreted when a page is loaded.
+
+.. [#f2] However, you **can** use :meth:`Document.getToC` and :meth:`Page.getLinks` (which are available for all document types) and copy this information over to the output PDF. See demo `pdf-converter.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/pdf-converter.py>`_.
+
+.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.nextLocation`, :meth:`Document.previousLocation` and :attr:`Document.lastLocation` for maintaining a high level of coding efficiency.
diff --git a/docs/faq.rst b/docs/faq.rst

new file mode 100644 (file)

index 0000000..db6cc13
--- /dev/null
+++ b/docs/faq.rst
@@ -0,0 +1,2135 @@
+.. _FAQ:
+
+==============================
+Collection of Recipes
+==============================
+
+.. highlight:: python
+
+A collection of recipes in "How-To" format for using PyMuPDF. We aim to extend this section over time. Where appropriate we will refer to the corresponding `Wiki <https://github.com/pymupdf/PyMuPDF/wiki>`_ pages, but some duplication may still occur.
+
+----------
+
+Images
+-------
+
+----------
+
+How to Make Images from Document Pages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This little script will take a document filename and generate a PNG file from each of its pages.
+
+The document can be any supported type like PDF, XPS, etc.
+
+The script works as a command line tool which expects the filename being supplied as a parameter. The generated image files (1 per page) are stored in the directory of the script::
+
+    import sys, fitz  # import the binding
+    fname = sys.argv[1]  # get filename from command line
+    doc = fitz.open(fname)  # open document
+    for page in doc:  # iterate through the pages
+        pix = page.getPixmap(alpha = False)  # render page to an image
+        pix.writePNG("page-%i.png" % page.number)  # store image as a PNG
+
+The script directory will now contain PNG image files named *page-0.png*, *page-1.png*, etc. Pictures have the dimension of their pages, e.g. 595 x 842 pixels for an A4 portrait sized page. They will have a resolution of 72 dpi in x and y dimension and have no transparency. You can change all that -- for how to do do this, read the next sections.
+
+----------
+
+How to Increase :index:`Image Resolution <pair: image; resolution>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The image of a document page is represented by a :ref:`Pixmap`, and the simplest way to create a pixmap is via method :meth:`Page.getPixmap`.
+
+This method has many options for influencing the result. The most important among them is the :ref:`Matrix`, which lets you :index:`zoom`, rotate, distort or mirror the outcome.
+
+:meth:`Page.getPixmap` by default will use the :ref:`Identity` matrix, which does nothing.
+
+In the following, we apply a :index:`zoom factor <pair: resolution;zoom>` of 2 to each dimension, which will generate an image with a four times better resolution for us (and also about 4 times the size)::
+
+    zoom_x = 2.0  # horizontal zoom
+    zomm_y = 2.0  # vertical zoom
+    mat = fitz.Matrix(zoom_x, zomm_y)  # zoom factor 2 in each dimension
+    pix = page.getPixmap(matrix = mat)  # use 'mat' instead of the identity matrix
+
+
+----------
+
+How to Create :index:`Partial Pixmaps` (Clips)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+You do not always need the full image of a page. This may be the case e.g. when you display the image in a GUI and would like to zoom into a part of the page.
+
+Let's assume your GUI window has room to display a full document page, but you now want to fill this room with the bottom right quarter of your page, thus using a four times better resolution.
+
+To achieve this, we define a rectangle equal to the area we want to appear in the GUI and call it "clip". One way of constructing rectangles in PyMuPDF is by providing two diagonally opposite corners, which is what we are doing here.
+
+.. image:: images/img-clip.jpg
+   :scale: 80
+
+::
+
+    mat = fitz.Matrix(2, 2)  # zoom factor 2 in each direction
+    rect = page.rect  # the page rectangle
+    mp = rect.tl + (rect.br - rect.tl) * 0.5  # its middle point
+    clip = fitz.Rect(mp, rect.br)  # the area we want
+    pix = page.getPixmap(matrix=mat, clip=clip)
+
+In the above we construct *clip* by specifying two diagonally opposite points: the middle point *mp* of the page rectangle, and its bottom right, *rect.br*.
+
+----------
+
+How to Create or Suppress Annotation Images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Normally, the pixmap of a page also shows the page's annotations. Occasionally, this may not be desireable.
+
+To suppress the annotation images on a rendered page, just specify *annots=False* in :meth:`Page.getPixmap`.
+
+You can also render annotations separately: :ref:`Annot` objects have their own :meth:`Annot.getPixmap` method. The resulting pixmap has the same dimensions as the annotation rectangle.
+
+----------
+
+.. index::
+   triple: extract;image;non-PDF
+   pair: convertToPDF;examples
+
+How to Extract Images: Non-PDF Documents
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In contrast to the previous sections, this section deals with **extracting** images **contained** in documents, so they can be displayed as part of one or more pages.
+
+If you want recreate the original image in file form or as a memory area, you have basically two options:
+
+1. Convert your document to a PDF, and then use one of the PDF-only extraction methods. This snippet will convert a document to PDF::
+
+    >>> pdfbytes = doc.convertToPDF()  # this a bytes object
+    >>> pdf = fitz.open("pdf", pdfbytes)  # open it as a PDF document
+    >>> # now use 'pdf' like any PDF document
+
+2. Use :meth:`Page.getText` with the "dict" parameter. This will extract all text and images shown on the page, formatted as a Python dictionary. Every image will occur in an image block, containing meta information and the binary image data. For details of the dictionary's structure, see :ref:`TextPage`. The method works equally well for PDF files. This creates a list of all images shown on a page::
+
+    >>> d = page.getText("dict")
+    >>> blocks = d["blocks"]
+    >>> imgblocks = [b for b in blocks if b["type"] == 1]
+
+Each item if "imgblocks" is a dictionary which looks like this::
+
+    {"type": 1, "bbox": (x0, y0, x1, y1), "width": w, "height": h, "ext": "png", "image": b"..."}
+
+----------
+
+.. index::
+   triple: extract;image;PDF
+   pair: extractImage;examples
+
+How to Extract Images: PDF Documents
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Like any other "object" in a PDF, images are identified by a cross reference number (:data:`xref`, an integer). If you know this number, you have two ways to access the image's data:
+
+1. **Create** a :ref:`Pixmap` of the image with instruction *pix = fitz.Pixmap(doc, xref)*. This method is **very** fast (single digit micro-seconds). The pixmap's properties (width, height, ...) will reflect the ones of the image. In this case there is no way to tell which image format the embedded original has.
+
+2. **Extract** the image with *img = doc.extractImage(xref)*. This is a dictionary containing the binary image data as *img["image"]*. A number of meta data are also provided -- mostly the same as you would find in the pixmap of the image. The major difference is string *img["ext"]*, which specifies the image format: apart from "png", strings like "jpeg", "bmp", "tiff", etc. can also occur. Use this string as the file extension if you want to store to disk. The execution speed of this method should be compared to the combined speed of the statements *pix = fitz.Pixmap(doc, xref);pix.getPNGData()*. If the embedded image is in PNG format, the speed of :meth:`Document.extractImage` is about the same (and the binary image data are identical). Otherwise, this method is **thousands of times faster**, and the **image data is much smaller**.
+
+The question remains: **"How do I know those 'xref' numbers of images?"**. There are two answers to this:
+
+a. **"Inspect the page objects:"** Loop through the items of :meth:`Page.getImageList`. It is a list of list, and its items look like *[xref, smask, ...]*, containing the :data:`xref` of an image. This :data:`xref` can then be used with one of the above methods. Use this method for **valid (undamaged)** documents. Be wary however, that the same image may be referenced multiple times (by different pages), so you might want to provide a mechanism avoiding multiple extracts.
+b. **"No need to know:"** Loop through the list of **all xrefs** of the document and perform a :meth:`Document.extractImage` for each one. If the returned dictionary is empty, then continue -- this :data:`xref` is no image. Use this method if the PDF is **damaged (unusable pages)**. Note that a PDF often contains "pseudo-images" ("stencil masks") with the special purpose of defining the transparency of some other image. You may want to provide logic to exclude those from extraction. Also have a look at the next section.
+
+For both extraction approaches, there exist ready-to-use general purpose scripts:
+
+`extract-imga.py <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/extract-imga.py>`_ extracts images page by page:
+
+.. image:: images/img-extract-imga.jpg
+   :scale: 80
+
+and `extract-imgb.py <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/extract-imgb.py>`_ extracts images by xref table:
+
+.. image:: images/img-extract-imgb.jpg
+   :scale: 80
+
+----------
+
+How to Handle Stencil Masks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Some images in PDFs are accompanied by **stencil masks**. In their simplest form stencil masks represent alpha (transparency) bytes stored as seperate images. In order to reconstruct the original of an image, which has a stencil mask, it must be "enriched" with transparency bytes taken from its stencil mask.
+
+Whether an image does have such a stencil mask can be recognized in one of two ways in PyMuPDF:
+
+1. An item of :meth:`Document.getPageImageList` has the general format *[xref, smask, ...]*, where *xref* is the image's :data:`xref` and *smask*, if positive, is the :data:`xref` of a stencil mask.
+2. The (dictionary) results of :meth:`Document.extractImage` have a key *"smask"*, which also contains any stencil mask's :data:`xref` if positive.
+
+If *smask == 0* then the image encountered via :data:`xref` can be processed as it is.
+
+To recover the original image using PyMuPDF, the procedure depicted as follows must be executed:
+
+.. image:: images/img-stencil.jpg
+   :scale: 60
+
+::
+    pix1 = fitz.Pixmap(doc, xref)    # (1) pixmap of image w/o alpha
+    pix2 = fitz.Pixmap(doc, smask)   # (2) stencil pixmap
+    pix = fitz.Pixmap(pix1)          # (3) copy of pix1, empty alpha channel added
+    pix.setAlpha(pix2.samples)       # (4) fill alpha channel
+
+Step (1) creates a pixmap of the "netto" image. Step (2) does the same with the stencil mask. Please note that the :attr:`Pixmap.samples` attribute of *pix2* contains the alpha bytes that must be stored in the final pixmap. This is what happens in step (3) and (4).
+
+The scripts `extract-imga.py <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/extract-imga.py>`_, and `extract-imgb.py <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/extract-imgb.py>`_ above also contain this logic.
+
+----------
+
+.. index::
+   triple: picture;embed;PDF
+   pair: showPDFpage;examples
+   pair: insertImage;examples
+   pair: embeddedFileAdd;examples
+   pair: addFileAnnot;examples
+
+How to Make one PDF of all your Pictures (or Files)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We show here **three scripts** that take a list of (image and other) files and put them all in one PDF.
+
+**Method 1: Inserting Images as Pages**
+
+The first one converts each image to a PDF page with the same dimensions. The result will be a PDF with one page per image. It will only work for supported image file formats::
+
+ import os, fitz
+ import PySimpleGUI as psg  # for showing a progress bar
+ doc = fitz.open()  # PDF with the pictures
+ imgdir = "D:/2012_10_05"  # where the pics are
+ imglist = os.listdir(imgdir)  # list of them
+ imgcount = len(imglist)  # pic count
+
+ for i, f in enumerate(imglist):
+     img = fitz.open(os.path.join(imgdir, f))  # open pic as document
+     rect = img[0].rect  # pic dimension
+     pdfbytes = img.convertToPDF()  # make a PDF stream
+     img.close()  # no longer needed
+     imgPDF = fitz.open("pdf", pdfbytes)  # open stream as PDF
+     page = doc.newPage(width = rect.width,  # new page with ...
+                        height = rect.height)  # pic dimension
+     page.showPDFpage(rect, imgPDF, 0)  # image fills the page
+     psg.EasyProgressMeter("Import Images",  # show our progress
+         i+1, imgcount)
+
+ doc.save("all-my-pics.pdf")
+
+This will generate a PDF only marginally larger than the combined pictures' size. Some numbers on performance:
+
+The above script needed about 1 minute on my machine for 149 pictures with a total size of 514 MB (and about the same resulting PDF size).
+
+.. image:: images/img-import-progress.jpg
+   :scale: 80
+
+Look `here <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/all-my-pics-inserted.py>`_ for a more complete source code: it offers a directory selection dialog and skips unsupported files and non-file entries.
+
+.. note:: We might have used :meth:`Page.insertImage` instead of :meth:`Page.showPDFpage`, and the result would have been a similar looking file. However, depending on the image type, it may store **images uncompressed**. Therefore, the save option *deflate = True* must be used to achieve a reasonable file size, which hugely increases the runtime for large numbers of images. So this alternative **cannot be recommended** here.
+
+**Method 2: Embedding Files**
+
+The second script **embeds** arbitrary files -- not only images. The resulting PDF will have just one (empty) page, required for technical reasons. To later access the embedded files again, you would need a suitable PDF viewer that can display and / or extract embedded files::
+
+ import os, fitz
+ import PySimpleGUI as psg  # for showing progress bar
+ doc = fitz.open()  # PDF with the pictures
+ imgdir = "D:/2012_10_05"  # where my files are
+
+ imglist = os.listdir(imgdir)  # list of pictures
+ imgcount = len(imglist)  # pic count
+ imglist.sort()  # nicely sort them
+
+ for i, f in enumerate(imglist):
+     img = open(os.path.join(imgdir,f), "rb").read()  # make pic stream
+     doc.embeddedFileAdd(img, f, filename=f,  # and embed it
+                         ufilename=f, desc=f)
+     psg.EasyProgressMeter("Embedding Files",  # show our progress
+         i+1, imgcount)
+
+ page = doc.newPage()  # at least 1 page is needed
+
+ doc.save("all-my-pics-embedded.pdf")
+
+.. image:: images/img-embed-progress.jpg
+   :scale: 80
+
+This is by far the fastest method, and it also produces the smallest possible output file size. The above pictures needed 20 seonds on my machine and yielded a PDF size of 510 MB. Look `here <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/all-my-pics-embedded.py>`_ for a more complete source code: it offers a direcory selection dialog and skips non-file entries.
+
+**Method 3: Attaching Files**
+
+A third way to achieve this task is **attaching files** via page annotations see `here <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/all-my-pics-attached.py>`_ for the complete source code.
+
+This has a similar performance as the previous script and it also produces a similar file size. It will produce PDF pages which show a 'FileAttachment' icon for each attached file.
+
+.. image:: images/img-attach-result.jpg
+
+.. note:: Both, the **embed** and the **attach** methods can be used for **arbitrary files** -- not just images.
+
+.. note:: We strongly recommend using the awesome package `PySimpleGUI <https://pypi.org/project/PySimpleGUI/>`_ to display a progress meter for tasks that may run for an extended time span. It's pure Python, uses Tkinter (no additional GUI package) and requires just one more line of code!
+
+----------
+
+.. index::
+   triple: vector;image;SVG
+   pair: showPDFpage;examples
+   pair: insertImage;examples
+   pair: embeddedFileAdd;examples
+
+How to Create Vector Images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The usual way to create an image from a document page is :meth:`Page.getPixmap`. A pixmap represents a raster image, so you must decide on its quality (i.e. resolution) at creation time. It cannot be changed later.
+
+PyMuPDF also offers a way to create a **vector image** of a page in SVG format (scalable vector graphics, defined in XML syntax). SVG images remain precise across zooming levels (of course with the exception of any raster graphic elements embedded therein).
+
+Instruction *svg = page.getSVGimage(matrix = fitz.Identity)* delivers a UTF-8 string *svg* which can be stored with extension ".svg".
+
+----------
+
+.. index::
+   pair: writeImage;examples
+   pair: getImageData;examples
+   pair: Photoshop;examples
+   pair: Postscript;examples
+   pair: JPEG;examples
+   pair: PhotoImage;examples
+
+How to Convert Images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Just as a feature among others, PyMuPDF's image conversion is easy. It may avoid using other graphics packages like PIL/Pillow in many cases.
+
+Notwithstanding that interfacing with Pillow is almost trivial.
+
+================= ================== =========================================
+**Input Formats** **Output Formats** **Description**
+================= ================== =========================================
+BMP               .                  Windows Bitmap
+JPEG              .                  Joint Photographic Experts Group
+JXR               .                  JPEG Extended Range
+JPX               .                  JPEG 2000
+GIF               .                  Graphics Interchange Format
+TIFF              .                  Tagged Image File Format
+PNG               PNG                Portable Network Graphics
+PNM               PNM                Portable Anymap
+PGM               PGM                Portable Graymap
+PBM               PBM                Portable Bitmap
+PPM               PPM                Portable Pixmap
+PAM               PAM                Portable Arbitrary Map
+.                 PSD                Adobe Photoshop Document
+.                 PS                 Adobe Postscript
+================= ================== =========================================
+
+The general scheme is just the following two lines::
+
+    pix = fitz.Pixmap("input.xxx")  # any supported input format
+    pix.writeImage("output.yyy")  # any supported output format
+
+**Remarks**
+
+1. The **input** argument of *fitz.Pixmap(arg)* can be a file or a bytes / io.BytesIO object containing an image.
+2. Instead of an output **file**, you can also create a bytes object via *pix.getImageData("yyy")* and pass this around.
+3. As a matter of course, input and output formats must be compatible in terms of colorspace and transparency. The *Pixmap* class has batteries included if adjustments are needed.
+
+.. note::
+        **Convert JPEG to Photoshop**::
+
+          pix = fitz.Pixmap("myfamily.jpg")
+          pix.writeImage("myfamily.psd")
+
+
+.. note::
+        **Save to JPEG** using PIL/Pillow::
+
+          from PIL import Image
+          pix = fitz.Pixmap(...)
+          img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
+          img.save("output.jpg", "JPEG")
+
+.. note::
+        Convert **JPEG to Tkinter PhotoImage**. Any **RGB / no-alpha** image works exactly the same. Conversion to one of the **Portable Anymap** formats (PPM, PGM, etc.) does the trick, because they are supported by all Tkinter versions::
+
+          if str is bytes:  # this is Python 2!
+              import Tkinter as tk
+          else:  # Python 3 or later!
+              import tkinter as tk
+          pix = fitz.Pixmap("input.jpg")  # or any RGB / no-alpha image
+          tkimg = tk.PhotoImage(data=pix.getImageData("ppm"))
+
+.. note::
+        Convert **PNG with alpha** to Tkinter PhotoImage. This requires **removing the alpha bytes**, before we can do the PPM conversion::
+
+          if str is bytes:  # this is Python 2!
+              import Tkinter as tk
+          else:  # Python 3 or later!
+              import tkinter as tk
+          pix = fitz.Pixmap("input.png")  # may have an alpha channel
+          if pix.alpha:  # we have an alpha channel!
+              pix = fitz.Pixmap(pix, 0)  # remove it
+          tkimg = tk.PhotoImage(data=pix.getImageData("ppm"))
+
+----------
+
+.. index::
+   pair: copyPixmap;examples
+
+How to Use Pixmaps: Glueing Images
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This shows how pixmaps can be used for purely graphical, non-document purposes. The script reads an image file and creates a new image which consist of 3 * 4 tiles of the original::
+
+ import fitz
+ src = fitz.Pixmap("img-7edges.png")      # create pixmap from a picture
+ col = 3                                  # tiles per row
+ lin = 4                                  # tiles per column
+ tar_w = src.width * col                  # width of target
+ tar_h = src.height * lin                 # height of target
+
+ # create target pixmap
+ tar_pix = fitz.Pixmap(src.colorspace, (0, 0, tar_w, tar_h), src.alpha)
+
+ # now fill target with the tiles
+ for i in range(col):
+     src.x = src.width * i                # modify input's x coord
+     for j in range(lin):
+         src.y = src.height * j           # modify input's y coord
+         tar_pix.copyPixmap(src, src.irect) # copy input to new loc
+
+ tar_pix.writePNG("tar.png")
+
+This is the input picture:
+
+.. image:: images/img-7edges.png
+   :scale: 33
+
+Here is the output:
+
+.. image:: images/img-target.png
+   :scale: 33
+
+----------
+
+.. index::
+   pair: setRect;examples
+   pair: invertIRect;examples
+   pair: copyPixmap;examples
+   pair: writeImage;examples
+
+How to Use Pixmaps: Making a Fractal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here is another Pixmap example that creates **Sierpinski's Carpet** -- a fractal generalizing the **Cantor Set** to two dimensions. Given a square carpet, mark its 9 sub-suqares (3 times 3) and cut out the one in the center. Treat each of the remaining eight sub-squares in the same way, and continue *ad infinitum*. The end result is a set with area zero and fractal dimension 1.8928...
+
+This script creates a approximative PNG image of it, by going down to one-pixel granularity. To increase the image precision, change the value of n (precision)::
+
+    import fitz, time
+    if not list(map(int, fitz.VersionBind.split("."))) >= [1, 14, 8]:
+        raise SystemExit("need PyMuPDF v1.14.8 for this script")
+    n = 6                             # depth (precision)
+    d = 3**n                          # edge length
+
+    t0 = time.perf_counter()
+    ir = (0, 0, d, d)                 # the pixmap rectangle
+
+    pm = fitz.Pixmap(fitz.csRGB, ir, False)
+    pm.setRect(pm.irect, (255,255,0)) # fill it with some background color
+
+    color = (0, 0, 255)               # color to fill the punch holes
+
+    # alternatively, define a 'fill' pixmap for the punch holes
+    # this could be anything, e.g. some photo image ...
+    fill = fitz.Pixmap(fitz.csRGB, ir, False) # same size as 'pm'
+    fill.setRect(fill.irect, (0, 255, 255))   # put some color in
+
+    def punch(x, y, step):
+        """Recursively "punch a hole" in the central square of a pixmap.
+        
+        Arguments are top-left coords and the step width.
+
+        Some alternative punching methods are commented out.
+        """
+        s = step // 3                 # the new step
+        # iterate through the 9 sub-squares
+        # the central one will be filled with the color
+        for i in range(3):
+            for j in range(3):
+                if i != j or i != 1:  # this is not the central cube
+                    if s >= 3:        # recursing needed?
+                        punch(x+i*s, y+j*s, s)       # recurse
+                else:                 # punching alternatives are:
+                    pm.setRect((x+s, y+s, x+2*s, y+2*s), color)     # fill with a color
+                    #pm.copyPixmap(fill, (x+s, y+s, x+2*s, y+2*s))  # copy from fill
+                    #pm.invertIRect((x+s, y+s, x+2*s, y+2*s))       # invert colors
+
+        return
+
+    #==============================================================================
+    # main program
+    #==============================================================================
+    # now start punching holes into the pixmap
+    punch(0, 0, d)
+    t1 = time.perf_counter()
+    pm.writeImage("sierpinski-punch.png")
+    t2 = time.perf_counter()
+    print ("%g sec to create / fill the pixmap" % round(t1-t0,3))
+    print ("%g sec to save the image" % round(t2-t1,3))
+
+The result should look something like this:
+
+.. image:: images/img-sierpinski.png
+   :scale: 33
+
+----------
+
+How to Interface with NumPy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This shows how to create a PNG file from a numpy array (several times faster than most other methods)::
+
+ import numpy as np
+ import fitz
+ #==============================================================================
+ # create a fun-colored width * height PNG with fitz and numpy
+ #==============================================================================
+ height = 150
+ width  = 100
+ bild = np.ndarray((height, width, 3), dtype=np.uint8)
+
+ for i in range(height):
+     for j in range(width):
+         # one pixel (some fun coloring)
+         bild[i, j] = [(i+j)%256, i%256, j%256]
+
+ samples = bytearray(bild.tostring())    # get plain pixel data from numpy array
+ pix = fitz.Pixmap(fitz.csRGB, width, height, samples, alpha=False)
+ pix.writePNG("test.png")
+
+
+----------
+
+How to Add Images to a PDF Page
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are two methods to add images to a PDF page: :meth:`Page.insertImage` and :meth:`Page.showPDFpage`. Both methods have things in common, but there also exist differences.
+
+============================== ===================================== =========================================
+**Criterion**                  :meth:`Page.insertImage`              :meth:`Page.showPDFpage`
+============================== ===================================== =========================================
+displayable content            image file, image in memory, pixmap   PDF page
+display resolution             image resolution                      vectorized (except raster page content)
+rotation                       multiple of 90 degrees                any angle
+clipping                       no (full image only)                  yes
+keep aspect ratio              yes (default option)                  yes (default option)
+transparency (water marking)   depends on image                      yes
+location / placement           scaled to fit target rectangle        scaled to fit target rectangle
+performance                    automatic prevention of duplicates;   automatic prevention of duplicates;
+                               MD5 calculation on every execution    faster than :meth:`Page.insertImage`
+multi-page image support       no                                    yes
+ease of use                    simple, intuitive;                    simple, intuitive;
+                               performance considerations apply      **usable for all document types**
+                               for multiple insertions of same image (including images!) after conversion to
+                                                                     PDF via :meth:`Document.convertToPDF`
+============================== ===================================== =========================================
+
+Basic code pattern for :meth:`Page.insertImage`. **Exactly one** of the parameters **filename / stream / pixmap** must be given::
+
+    page.insertImage(
+        rect,                  # where to place the image (rect-like)
+        filename=None,         # image in a file
+        stream=None,           # image in memory (bytes)
+        pixmap=None,           # image from pixmap
+        rotate=0,              # rotate (int, multiple of 90)
+        keep_proportion=True,  # keep aspect ratio
+        overlay=True,          # put in foreground
+    )
+
+Basic code pattern for :meth:`Page.showPDFpage`. Source and target PDF must be different :ref:`Document` objects (but may be opened from the same file)::
+
+    page.showPDFpage(
+        rect,                  # where to place the image (rect-like)
+        src,                   # source PDF
+        pno=0,                 # page number in source PDF
+        clip=None,             # only display this area (rect-like)
+        rotate=0,              # rotate (float, any value)
+        keep_proportion=True,  # keep aspect ratio
+        overlay=True,          # put in foreground
+    )
+
+Text
+-----
+
+----------
+
+How to Extract all Document Text
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This script will take a document filename and generate a text file from all of its text.
+
+The document can be any supported type like PDF, XPS, etc.
+
+The script works as a command line tool which expects the document filename supplied as a parameter. It generates one text file named "filename.txt" in the script directory. Text of pages is separated by a line "-----"::
+
+    import sys, fitz
+    fname = sys.argv[1]  # get document filename
+    doc = fitz.open(fname)  # open document
+    out = open(fname + ".txt", "wb")  # open text output
+    for page in doc:  # iterate the document pages
+        text = page.getText().encode("utf8")  # get plain text (is in UTF-8)
+        out.write(text)  # write text of page
+        out.write(bytes((12,)))  # write page delimiter (form feed 0x0C)
+    out.close()
+
+The output will be plain text as it is coded in the document. No effort is made to prettify in any way. Specifally for PDF, this may mean output not in usual reading order, unexpected line breaks and so forth.
+
+You have many options to cure this -- see chapter :ref:`Appendix2`. Among them are:
+
+1. Extract text in HTML format and store it as a HTML document, so it can be viewed in any browser.
+2. Extract text as a list of text blocks via *Page.getText("blocks")*. Each item of this list contains position information for its text, which can be used to establish a convenient reading order.
+3. Extract a list of single words via *Page.getText("words")*. Its items are words with position information. Use it to determine text contained in a given rectangle -- see next section.
+
+See the following two section for examples and further explanations.
+
+
+.. index::
+   triple: extract;text;rectangle
+
+How to Extract Text from within a Rectangle
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Please refer to the script `textboxtract.py <https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/examples/textboxtract.py>`_.
+
+It demonstrates ways to extract text contained in the following red rectangle,
+
+.. image:: images/img-textboxtract.png
+   :scale: 75
+
+.. highlight:: text
+
+by using more or less restrictive conditions to find the relevant words::
+
+    Select the words strictly contained in rectangle
+    ------------------------------------------------
+    Die Altersübereinstimmung deutete darauf hin,
+    engen, nur 50 Millionen Jahre großen
+    Gesteinshagel auf den Mond traf und dabei
+    hinterließ – einige größer als Frankreich.
+    es sich um eine letzte, infernalische Welle
+    Geburt des Sonnensystems. Daher tauften die
+    das Ereignis »lunare Katastrophe«. Später
+    die Bezeichnung Großes Bombardement durch.
+
+Or, more forgiving, respectively::
+
+    Select the words intersecting the rectangle
+    -------------------------------------------
+    Die Altersübereinstimmung deutete darauf hin, dass
+    einem engen, nur 50 Millionen Jahre großen Zeitfenster
+    ein Gesteinshagel auf den Mond traf und dabei unzählige
+    Krater hinterließ – einige größer als Frankreich. Offenbar
+    handelte es sich um eine letzte, infernalische Welle nach
+    der Geburt des Sonnensystems. Daher tauften die Caltech-
+    Forscher das Ereignis »lunare Katastrophe«. Später setzte
+    sich die Bezeichnung Großes Bombardement durch.
+
+The latter output also includes words *intersecting* the rectangle.
+
+.. highlight:: python
+
+What if your **rectangle spans across more than one page**? Follow this recipe:
+
+* Create a common list of all words of all pages which your rectangle intersects.
+* When adding word items to this common list, increase their **y-coordinates** by the accumulated height of all previous pages.
+
+
+----------
+
+.. index::
+    pair: text;reading order
+
+How to Extract Text in Natural Reading Order
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One of the common issues with PDF text extraction is, that text may not appear in any particular reading order.
+
+Responsible for this effect is the PDF creator (software or a human). For example, page headers may have been inserted in a separate step -- after the document had been produced. In such a case, the header text will appear at the end of a page text extraction (allthough it will be correctly shown by PDF viewer software). For example, the following snippet will add some header and footer lines to an existing PDF::
+
+    doc = fitz.open("some.pdf")
+    header = "Header"  # text in header
+    footer = "Page %i of %i"  # text in footer
+    for page in doc:
+        page.insertText((50, 50), header)  # insert header
+        page.insertText(  # insert footer 50 points above page bottom
+            (50, page.rect.height - 50),
+            footer % (page.number + 1, len(doc)),
+        )
+
+The text sequence extracted from a page modified in this way will look like this:
+
+1. original text
+2. header line
+3. footer line
+
+PyMuPDF has several means to re-establish some reading sequence or even to re-generate a layout close to the original.
+
+As a starting point take the above mentioned `script <https://github.com/pymupdf/PyMuPDF/wiki/How-to-extract-text-from-a-rectangle>`_ and then use the full page rectangle.
+
+On rare occasions, when the PDF creator has been "over-creative", extracted text does not even keep the correct reading sequence of **single letters**: instead of the two words "DELUXE PROPERTY" you might sometimes get an anagram, consisting of 8 words like "DEL", "XE" , "P", "OP", "RTY", "U", "R" and "E".
+
+Such a PDF is also not searchable by all PDF viewers, but it is displayed correctly and looks harmless.
+
+In those cases, the following function will help composing the original words of the page. The resulting list is also searchable and can be used to deliver rectangles for the found text locations::
+
+    from operator import itemgetter
+    from itertools import groupby
+    import fitz
+
+    def recover(words, rect):
+        """ Word recovery.
+
+        Notes:
+            Method 'getTextWords()' does not try to recover words, if their single
+            letters do not appear in correct lexical order. This function steps in
+            here and creates a new list of recovered words.
+        Args:
+            words: list of words as created by 'getTextWords()'
+            rect: rectangle to consider (usually the full page)
+        Returns:
+            List of recovered words. Same format as 'getTextWords', but left out
+            block, line and word number - a list of items of the following format:
+            [x0, y0, x1, y1, "word"]
+        """
+        # build my sublist of words contained in given rectangle
+        mywords = [w for w in words if fitz.Rect(w[:4]) in rect]
+
+        # sort the words by lower line, then by word start coordinate
+        mywords.sort(key=itemgetter(3, 0))  # sort by y1, x0 of word rectangle
+
+        # build word groups on same line
+        grouped_lines = groupby(mywords, key=itemgetter(3))
+
+        words_out = []  # we will return this
+
+        # iterate through the grouped lines
+        # for each line coordinate ("_"), the list of words is given
+        for _, words_in_line in grouped_lines:
+            for i, w in enumerate(words_in_line):
+                if i == 0:  # store first word
+                    x0, y0, x1, y1, word = w[:5]
+                    continue
+
+                r = fitz.Rect(w[:4])  # word rect
+
+                # Compute word distance threshold as 20% of width of 1 letter.
+                # So we should be safe joining text pieces into one word if they
+                # have a distance shorter than that.
+                threshold = r.width / len(w[4]) / 5
+                if r.x0 <= x1 + threshold:  # join with previous word
+                    word += w[4]  # add string
+                    x1 = r.x1  # new end-of-word coordinate
+                    y0 = max(y0, r.y0)  # extend word rect upper bound
+                    continue
+
+                # now have a new word, output previous one
+                words_out.append([x0, y0, x1, y1, word])
+
+                # store the new word
+                x0, y0, x1, y1, word = w[:5]
+
+            # output word waiting for completion
+            words_out.append([x0, y0, x1, y1, word])
+
+        return words_out
+
+    def search_for(text, words):
+        """ Search for text in items of list of words
+
+        Notes:
+            Can be adjusted / extended in obvious ways, e.g. using regular
+            expressions, or being case insensitive, or only looking for complete
+            words, etc.
+        Args:
+            text: string to be searched for
+            words: list of items in format delivered by 'getTextWords()'.
+        Returns:
+            List of rectangles, one for each found locations.
+        """
+        rect_list = []
+        for w in words:
+            if text in w[4]:
+                rect_list.append(fitz.Rect(w[:4]))
+
+        return rect_list
+
+
+----------
+
+How to :index:`Extract Tables <pair: extract; table>` from Documents
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you see a table in a document, you are not normally looking at something like an embedded Excel or other identifyable object. It usually is just text, formatted to appear as appropriate.
+
+Extracting a tabular data from such a page area therefore means that you must find a way to **(1)** graphically indicate table and column borders, and **(2)** then extract text based on this information.
+
+The wxPython GUI script `wxTableExtract.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/wxTableExtract.py>`_ strives to exactly do that. You may want to have a look at it and adjust it to your liking.
+
+----------
+
+How to Search for and Mark Text
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+There is a standard search function to search for arbitrary text on a page: :meth:`Page.searchFor`. It returns a list of :ref:`Rect` objects which surround a found occurrence. These rectangles can for example be used to automatically insert annotations which visibly mark the found text.
+
+This method has advantages and drawbacks. Pros are
+
+* the search string can contain blanks and wrap across lines
+* upper or lower cases are treated equal
+* return may also be a list of :ref:`Quad` objects to precisely locate text that is **not parallel** to either axis.
+
+Disadvantages:
+
+* you cannot determine the number of found items beforehand: if *hit_max* items are returned you do not know whether you have missed any.
+
+But you have other options::
+
+ import sys
+ import fitz
+
+ def mark_word(page, text):
+     """Underline each word that contains 'text'.
+     """
+     found = 0
+     wlist = page.getTextWords()        # make the word list
+     for w in wlist:                    # scan through all words on page
+         if text in w[4]:               # w[4] is the word's string
+             found += 1                 # count
+             r = fitz.Rect(w[:4])       # make rect from word bbox
+             page.addUnderlineAnnot(r)  # underline
+     return found
+
+ fname = sys.argv[1]                    # filename
+ text = sys.argv[2]                     # search string
+ doc = fitz.open(fname)
+
+ print("underlining words containing '%s' in document '%s'" % (word, doc.name))
+
+ new_doc = False                        # indicator if anything found at all
+
+ for page in doc:                       # scan through the pages
+     found = mark_word(page, text)      # mark the page's words
+     if found:                          # if anything found ...
+         new_doc = True
+         print("found '%s' %i times on page %i" % (text, found, page.number + 1))
+
+ if new_doc:
+     doc.save("marked-" + doc.name)
+
+This script uses :meth:`Page.getTextWords` to look for a string, handed in via cli parameter. This method separates a page's text into "words" using spaces and line breaks as delimiters. Therefore the words in this lists contain no spaces or line breaks. Further remarks:
+
+* If found, the **complete word containing the string** is marked (underlined) -- not only the search string.
+* The search string may **not contain spaces** or other white space.
+* As shown here, upper / lower cases are **respected**. But this can be changed by using the string method *lower()* (or even regular expressions) in function *mark_word*.
+* There is **no upper limit**: all occurrences will be detected.
+* You can use **anything** to mark the word: 'Underline', 'Highlight', 'StrikeThrough' or 'Square' annotations, etc.
+* Here is an example snippet of a page of this manual, where "MuPDF" has been used as the search string. Note that all strings **containing "MuPDF"** have been completely underlined (not just the search string).
+
+.. image:: images/img-markedpdf.jpg
+   :scale: 60
+
+----------------------------------------------
+
+How to Analyze Font Characteristics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To analyze the characteristics of text in a PDF use this elementary script as a starting point:
+
+.. literalinclude:: text-lister.py
+   :language: python
+
+Here is the PDF page and the script output:
+
+.. image:: images/img-pdftext.jpg
+   :scale: 80
+
+-----------------------------------------
+
+How to Insert Text
+~~~~~~~~~~~~~~~~~~~~
+PyMuPDF provides ways to insert text on new or existing PDF pages with the following features:
+
+* choose the font, including built-in fonts and fonts that are available as files
+* choose text characteristics like bold, italic, font size, font color, etc.
+* position the text in multiple ways:
+
+    - either as simple line-oriented output starting at a certain point,
+    - or fitting text in a box provided as a rectangle, in which case text alignment choices are also available,
+    - choose whether text should be put in foreground (overlay existing content),
+    - all text can be arbitrarily "morphed", i.e. its appearance can be changed via a :ref:`Matrix`, to achieve effects like scaling, shearing or mirroring,
+    - independently from morphing and in addition to that, text can be rotated by integer multiples of 90 degrees.
+
+All of the above is provided by three basic :ref:`Page`, resp. :ref:`Shape` methods:
+
+* :meth:`Page.insertFont` -- install a font for the page for later reference. The result is reflected in the output of :meth:`Document.getPageFontList`. The font can be:
+
+    - provided as a file,
+    - already present somewhere in **this or another** PDF, or
+    - be a **built-in** font.
+
+* :meth:`Page.insertText` -- write some lines of text. Internally, this uses :meth:`Shape.insertText`.
+
+* :meth:`Page.insertTextbox` -- fit text in a given rectangle. Here you can choose text alignment features (left, right, centered, justified) and you keep control as to whether text actually fits. Internally, this uses :meth:`Shape.insertTextbox`.
+
+.. note:: Both text insertion methods automatically install the font as necessary.
+
+How to Write Text Lines
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Output some text lines on a page::
+
+    import fitz
+    doc = fitz.open(...)  # new or existing PDF
+    page = doc.newPage()  # new or existing page via doc[n]
+    p = fitz.Point(50, 72)  # start point of 1st line
+
+    text = "Some text,\nspread across\nseveral lines."
+    # the same result is achievable by
+    # text = ["Some text", "spread across", "several lines."]
+
+    rc = page.insertText(p,  # bottom-left of 1st char
+                         text,  # the text (honors '\n')
+                         fontname = "helv",  # the default font
+                         fontsize = 11,  # the default font size
+                         rotate = 0,  # also available: 90, 180, 270
+                         )
+    print("%i lines printed on page %i." % (rc, page.number))
+
+    doc.save("text.pdf")
+
+With this method, only the **number of lines** will be controlled to not go beyond page height. Surplus lines will not be written and the number of actual lines will be returned. The calculation uses *1.2 * fontsize* as the line height and 36 points (0.5 inches) as bottom margin.
+
+Line **width is ignored**. The surplus part of a line will simply be invisible.
+
+However, for built-in fonts there are ways to calculate the line width beforehand - see :meth:`getTextlength`.
+
+Here is another example. It inserts 4 text strings using the four different rotation options, and thereby explains, how the text insertion point must be chosen to achieve the desired result::
+
+    import fitz
+    doc = fitz.open()
+    page = doc.newPage()
+    # the text strings, each having 3 lines
+    text1 = "rotate=0\nLine 2\nLine 3"
+    text2 = "rotate=90\nLine 2\nLine 3"
+    text3 = "rotate=-90\nLine 2\nLine 3"
+    text4 = "rotate=180\nLine 2\nLine 3"
+    red = (1, 0, 0) # the color for the red dots
+    # the insertion points, each with a 25 pix distance from the corners
+    p1 = fitz.Point(25, 25)
+    p2 = fitz.Point(page.rect.width - 25, 25)
+    p3 = fitz.Point(25, page.rect.height - 25)
+    p4 = fitz.Point(page.rect.width - 25, page.rect.height - 25)
+    # create a Shape to draw on
+    shape = page.newShape()
+
+    # draw the insertion points as red, filled dots
+    shape.drawCircle(p1,1)
+    shape.drawCircle(p2,1)
+    shape.drawCircle(p3,1)
+    shape.drawCircle(p4,1)
+    shape.finish(width=0.3, color=red, fill=red)
+
+    # insert the text strings
+    shape.insertText(p1, text1)
+    shape.insertText(p3, text2, rotate=90)
+    shape.insertText(p2, text3, rotate=-90)
+    shape.insertText(p4, text4, rotate=180)
+
+    # store our work to the page
+    shape.commit()
+    doc.save(...)
+
+This is the result:
+
+.. image:: images/img-inserttext.jpg
+   :scale: 33
+
+
+
+------------------------------------------
+
+How to Fill a Text Box
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+This script fills 4 different rectangles with text, each time choosing a different rotation value::
+
+    import fitz
+    doc = fitz.open(...)  # new or existing PDF
+    page = doc.newPage()  # new page, or choose doc[n]
+    r1 = fitz.Rect(50,100,100,150)  # a 50x50 rectangle
+    disp = fitz.Rect(55, 0, 55, 0)  # add this to get more rects
+    r2 = r1 + disp  # 2nd rect
+    r3 = r1 + disp * 2  # 3rd rect
+    r4 = r1 + disp * 3  # 4th rect
+    t1 = "text with rotate = 0."  # the texts we will put in
+    t2 = "text with rotate = 90."
+    t3 = "text with rotate = -90."
+    t4 = "text with rotate = 180."
+    red  = (1,0,0)  # some colors
+    gold = (1,1,0)
+    blue = (0,0,1)
+    """We use a Shape object (something like a canvas) to output the text and
+    the rectangles surounding it for demonstration.
+    """
+    shape = page.newShape()  # create Shape
+    shape.drawRect(r1)  # draw rectangles
+    shape.drawRect(r2)  # giving them
+    shape.drawRect(r3)  # a yellow background
+    shape.drawRect(r4)  # and a red border
+    shape.finish(width = 0.3, color = red, fill = gold)
+    # Now insert text in the rectangles. Font "Helvetica" will be used
+    # by default. A return code rc < 0 indicates insufficient space (not checked here).
+    rc = shape.insertTextbox(r1, t1, color = blue)
+    rc = shape.insertTextbox(r2, t2, color = blue, rotate = 90)
+    rc = shape.insertTextbox(r3, t3, color = blue, rotate = -90)
+    rc = shape.insertTextbox(r4, t4, color = blue, rotate = 180)
+    shape.commit()  # write all stuff to page /Contents
+    doc.save("...")
+
+Several default values were used above: font "Helvetica", font size 11 and text alignment "left". The result will look like this:
+
+.. image:: images/img-textbox.jpg
+   :scale: 50
+
+------------------------------------------
+
+How to Use Non-Standard Encoding
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Since v1.14, MuPDF allows Greek and Russian encoding variants for the :data:`Base14_Fonts`. In PyMuPDF this is supported via an additional *encoding* argument. Effectively, this is relevant for Helvetica, Times-Roman and Courier (and their bold / italic forms) and characters outside the ASCII code range only. Elsewhere, the argument is ignored. Here is how to request Russian encoding with the standard font Helvetica::
+
+    page.insertText(point, russian_text, encoding=fitz.TEXT_ENCODING_CYRILLIC)
+
+The valid encoding values are TEXT_ENCODING_LATIN (0), TEXT_ENCODING_GREEK (1), and TEXT_ENCODING_CYRILLIC (2, Russian) with Latin being the default. Encoding can be specified by all relevant font and text insertion methods.
+
+By the above statement, the fontname *helv* is automatically connected to the Russian font variant of Helvetica. Any subsequent text insertion with **this fontname** will use the Russian Helvetica encoding.
+
+If you change the fontname just slightly, you can also achieve an **encoding "mixture"** for the **same base font** on the same page::
+
+    import fitz
+    doc=fitz.open()
+    page = doc.newPage()
+    shape = page.newShape()
+    t="Sômé tèxt wìth nöñ-Lâtîn characterß."
+    shape.insertText((50,70), t, fontname="helv", encoding=fitz.TEXT_ENCODING_LATIN)
+    shape.insertText((50,90), t, fontname="HElv", encoding=fitz.TEXT_ENCODING_GREEK)
+    shape.insertText((50,110), t, fontname="HELV", encoding=fitz.TEXT_ENCODING_CYRILLIC)
+    shape.commit()
+    doc.save("t.pdf")
+
+The result:
+
+.. image:: images/img-encoding.jpg
+   :scale: 50
+
+The snippet above indeed leads to three different copies of the Helvetica font in the PDF. Each copy is uniquely idetified (and referenceable) by using the correct upper-lower case spelling of the reserved word "helv"::
+
+    for f in doc.getPageFontList(0): print(f)
+
+    [6, 'n/a', 'Type1', 'Helvetica', 'helv', 'WinAnsiEncoding']
+    [7, 'n/a', 'Type1', 'Helvetica', 'HElv', 'WinAnsiEncoding']
+    [8, 'n/a', 'Type1', 'Helvetica', 'HELV', 'WinAnsiEncoding']
+
+-----------------------
+
+Annotations
+-----------
+In v1.14.0, annotation handling has been considerably extended:
+
+* New annotation type support for 'Ink', 'Rubber Stamp' and 'Squiggly' annotations. Ink annots simulate handwritings by combining one or more lists of interconnected points. Stamps are intended to visuably inform about a document's status or intended usage (like "draft", "confidential", etc.). 'Squiggly' is a text marker annot, which underlines selected text with a zigzagged line.
+
+* Extended 'FreeText' support:
+    1. all characters from the *Latin* character set are now available,
+    2. colors of text, rectangle background and rectangle border can be independently set
+    3. text in rectangle can be rotated by either +90 or -90 degrees
+    4. text is automatically wrapped (made multi-line) in available rectangle
+    5. all Base-14 fonts are now available (*normal* variants only, i.e. no bold, no italic).
+* MuPDF now supports line end icons for 'Line' annots (only). PyMuPDF supported that in v1.13.x already -- and for (almost) the full range of applicable types. So we adjusted the appearance of 'Polygon' and 'PolyLine' annots to closely resemble the one of MuPDF for 'Line'.
+* MuPDF now provides its own annotation icons where relevant. PyMuPDF switched to using them (for 'FileAttachment' and 'Text' ["sticky note"] so far).
+* MuPDF now also supports 'Caret', 'Movie', 'Sound' and 'Signature' annotations, which we may include in PyMuPDF at some later time.
+
+How to Add and Modify Annotations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In PyMuPDF, new annotations can be added added via :ref:`Page` methods. Once an annotation exists, it can be modified to a large extent using methods of the :ref:`Annot` class.
+
+In contrast to many other tools, initial insert of annotations happens with a minimum number of properties. We leave it to the programmer to e.g. set attributes like author, creation date or subject.
+
+As an overview for these capabilities, look at the following script that fills a PDF page with most of the available annotations. Look in the next sections for more special situations:
+
+.. literalinclude:: new-annots.py
+   :language: python
+
+
+This script should lead to the following output:
+
+.. image:: images/img-annots.jpg
+   :scale: 80
+
+------------------------------
+
+How to Mark Text
+~~~~~~~~~~~~~~~~~~~~~
+This script searches for text and marks it::
+
+    # -*- coding: utf-8 -*-
+    import fitz
+
+    # the document to annotate
+    doc = fitz.open("tilted-text.pdf")
+
+    # the text to be marked
+    t = "¡La práctica hace el campeón!"
+
+    # work with first page only
+    page = doc[0]
+
+    # get list of text locations
+    # we use "quads", not rectangles because text may be tilted!
+    rl = page.searchFor(t, quads = True)
+
+    # mark all found quads with one annotation
+    page.addSquigglyAnnot(rl)
+
+    # save to a new PDF
+    doc.save("a-squiggly.pdf")
+
+The result looks like this:
+
+.. image:: images/img-textmarker.jpg
+   :scale: 80
+
+------------------------------
+
+How to Use FreeText
+~~~~~~~~~~~~~~~~~~~~~
+This script shows a couple of ways to deal with 'FreeText' annotations::
+
+    # -*- coding: utf-8 -*-
+    import fitz
+
+    # some colors
+    blue  = (0,0,1)
+    green = (0,1,0)
+    red   = (1,0,0)
+    gold  = (1,1,0)
+
+    # a new PDF with 1 page
+    doc = fitz.open()
+    page = doc.newPage()
+
+    # 3 rectangles, same size, abvove each other
+    r1 = fitz.Rect(100,100,200,150)
+    r2 = r1 + (0,75,0,75)
+    r3 = r2 + (0,75,0,75)
+
+    # the text, Latin alphabet
+    t = "¡Un pequeño texto para practicar!"
+
+    # add 3 annots, modify the last one somewhat
+    a1 = page.addFreetextAnnot(r1, t, color=red)
+    a2 = page.addFreetextAnnot(r2, t, fontname="Ti", color=blue)
+    a3 = page.addFreetextAnnot(r3, t, fontname="Co", color=blue, rotate=90)
+    a3.setBorder(width=0)
+    a3.update(fontsize=8, fill_color=gold)
+
+    # save the PDF
+    doc.save("a-freetext.pdf")
+
+The result looks like this:
+
+.. image:: images/img-freetext.jpg
+   :scale: 80
+
+------------------------------
+
+Using Buttons and JavaScript
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Since MuPDF v1.16, 'FreeText' annotations no longer support bold or italic versions of the Times-Roman, Helvetica or Courier fonts.
+
+A big **thank you** to our user `@kurokawaikki <https://github.com/kurokawaikki>`_, who contributed the following script to **circumvent this restriction**.
+
+.. literalinclude:: make-bold.py
+   :language: python
+
+--------------------------
+
+How to Use Ink Annotations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Ink annotations are used to contain freehand scribblings. A typical example maybe an image of your signature consisting of first name and last name. Technically an ink annotation is implemented as a **list of lists of points**. Each point list is regarded as a continuous line connecting the points. Different point lists represent indepndent line segments of the annotation.
+
+The following script creates an ink annotation with two mathematical curves (sine and cosine function graphs) as line segments::
+
+    import math
+    import fitz
+
+    #------------------------------------------------------------------------------
+    # preliminary stuff: create function value lists for sine and cosine
+    #------------------------------------------------------------------------------
+    w360 = math.pi * 2  # go through full circle
+    deg = w360 / 360  # 1 degree as radiants
+    rect = fitz.Rect(100,200, 300, 300)  # use this rectangle
+    first_x = rect.x0  # x starts from left
+    first_y = rect.y0 + rect.height / 2.  # rect middle means y = 0
+    x_step = rect.width / 360  # rect width means 360 degrees
+    y_scale = rect.height / 2.  # rect height means 2
+    sin_points = []  # sine values go here
+    cos_points = []  # cosine values go here
+    for x in range(362):  # now fill in the values
+        x_coord = x * x_step + first_x  # current x coordinate
+        y = -math.sin(x * deg)  # sine
+        p = (x_coord, y * y_scale + first_y)  # corresponding point
+        sin_points.append(p)  # append
+        y = -math.cos(x * deg)  # cosine
+        p = (x_coord, y * y_scale + first_y)  # corresponding point
+        cos_points.append(p)  # append
+
+    #------------------------------------------------------------------------------
+    # create the document with one page
+    #------------------------------------------------------------------------------
+    doc = fitz.open()  # make new PDF
+    page = doc.newPage()  # give it a page
+
+    #------------------------------------------------------------------------------
+    # add the Ink annotation, consisting of 2 curve segments
+    #------------------------------------------------------------------------------
+    annot = page.addInkAnnot((sin_points, cos_points))
+    # let it look a little nicer
+    annot.setBorder(width=0.3, dashes=[1,])  # line thickness, some dashing
+    annot.setColors(stroke=(0,0,1))  # make the lines blue
+    annot.update()  # update the appearance
+
+    page.drawRect(rect, width=0.3)  # only to demonstrate we did OK
+
+    doc.save("a-inktest.pdf")
+
+This is the result:
+
+.. image:: images/img-inkannot.jpg
+    :scale: 50
+
+------------------------------
+
+Drawing and Graphics
+---------------------
+
+PDF files support elementary drawing operations as part of their syntax. This includes basic geometrical objects like lines, curves, circles, rectangles including specifying colors.
+
+The syntax for such operations is defined in "A Operator Summary" on page 985 of the :ref:`AdobeManual`. Specifying these operators for a PDF page happens in its :data:`contents` objects.
+
+PyMuPDF implements a large part of the available features via its :ref:`Shape` class, which is comparable to notions like "canvas" in other packages (e.g. `reportlab <https://pypi.org/project/reportlab/>`_).
+
+A shape is always created as a **child of a page**, usually with an instruction like *shape = page.newShape()*. The class defines numerous methods that perform drawing operations on the page's area. For example, *last_point = shape.drawRect(rect)* draws a rectangle along the borders of a suitably defined *rect = fitz.Rect(...)*.
+
+The returned *last_point* **always** is the :ref:`Point` where drawing operation ended ("last point"). Every such elementary drawing requires a subsequent :meth:`Shape.finish` to "close" it, but there may be multiple drawings which have one common *finish()* method.
+
+In fact, :meth:`Shape.finish` *defines* a group of preceding draw operations to form one -- potentially rather complex -- graphics object. PyMuPDF provides several predefined graphics in `shapes_and_symbols.py <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/shapes_and_symbols.py>`_ which demonstrate how this works.
+
+If you import this script, you can also directly use its graphics as in the following exmple::
+
+    # -*- coding: utf-8 -*-
+    """
+    Created on Sun Dec  9 08:34:06 2018
+
+    @author: Jorj
+    @license: GNU GPL 3.0+
+
+    Create a list of available symbols defined in shapes_and_symbols.py
+
+    This also demonstrates an example usage: how these symbols could be used
+    as bullet-point symbols in some text.
+
+    """
+
+    import fitz
+    import shapes_and_symbols as sas
+
+    # list of available symbol functions and their descriptions
+    tlist = [
+             (sas.arrow, "arrow (easy)"),
+             (sas.caro, "caro (easy)"),
+             (sas.clover, "clover (easy)"),
+             (sas.diamond, "diamond (easy)"),
+             (sas.dontenter, "do not enter (medium)"),
+             (sas.frowney, "frowney (medium)"),
+             (sas.hand, "hand (complex)"),
+             (sas.heart, "heart (easy)"),
+             (sas.pencil, "pencil (very complex)"),
+             (sas.smiley, "smiley (easy)"),
+             ]
+
+    r = fitz.Rect(50, 50, 100, 100)  # first rect to contain a symbol
+    d = fitz.Rect(0, r.height + 10, 0, r.height + 10)  # displacement to next ret
+    p = (15, -r.height * 0.2)  # starting point of explanation text
+    rlist = [r]  # rectangle list
+
+    for i in range(1, len(tlist)):  # fill in all the rectangles
+        rlist.append(rlist[i-1] + d)
+
+    doc = fitz.open()  # create empty PDF
+    page = doc.newPage()  # create an empty page
+    shape = page.newShape()  # start a Shape (canvas)
+
+    for i, r in enumerate(rlist):
+        tlist[i][0](shape, rlist[i])  # execute symbol creation
+        shape.insertText(rlist[i].br + p,  # insert description text
+                       tlist[i][1], fontsize=r.height/1.2)
+
+    # store everything to the page's /Contents object
+    shape.commit()
+
+    import os
+    scriptdir = os.path.dirname(__file__)
+    doc.save(os.path.join(scriptdir, "symbol-list.pdf"))  # save the PDF
+
+
+This is the script's outcome:
+
+.. image:: images/img-symbols.jpg
+   :scale: 50
+
+------------------------------
+
+Multiprocessing
+----------------
+MuPDF has no integrated support for threading - they call themselves "threading-agnostic". While there do exist tricky possibilities to still use threading with MuPDF, the baseline consequence for **PyMuPDF** is:
+
+**No Python threading support**.
+
+Using PyMuPDF in a Python threading environment will lead to blocking effects for the main thread.
+
+However, there exists the option to use Python's *multiprocessing* module in a variety of ways.
+
+If you are looking to speed up page-oriented processing for a large document, use this script as a starting point. It should be at least twice as fast as the corresponding sequential processing.
+
+.. literalinclude:: multiprocess-render.py
+   :language: python
+
+Here is a more complex example involving inter-process communication between a main process (showing a GUI) and a child process doing PyMuPDF access to a document.
+
+.. literalinclude:: multiprocess-gui.py
+   :language: python
+
+------------------------------
+
+General
+--------
+
+How to Open with :index:`a Wrong File Extension <pair: wrong; file extension>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you have a document with a wrong file extension for its type, you can still correctly open it.
+
+Assume that "some.file" is actually an XPS. Open it like so:
+
+>>> doc = fitz.open("some.file", filetype = "xps")
+
+.. note:: MuPDF itself does not try to determine the file type from the file contents. **You** are responsible for supplying the filetype info in some way -- either implicitely via the file extension, or explicitely as shown. There are pure Python packages like `filetype <https://pypi.org/project/filetype/>`_ that help you doing this. Also consult the :ref:`Document` chapter for a full description.
+
+----------
+
+How to :index:`Embed or Attach Files <triple: attach;embed;file>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+PDF supports incorporating arbitrary data. This can be done in one of two ways: "embedding" or "attaching". PyMuPDF supports both options.
+
+1. Attached Files: data are **attached to a page** by way of a *FileAttachment* annotation with this statement: *annot = page.addFileAnnot(pos, ...)*, for details see :meth:`Page.addFileAnnot`. The first parameter "pos" is the :ref:`Point`, where a "PushPin" icon should be placed on the page.
+
+2. Embedded Files: data are embedded on the **document level** via method :meth:`Document.embeddedFileAdd`.
+
+The basic differences between these options are **(1)** you need edit permission to embed a file, but only annotation permission to attach, **(2)** like all annotations, attachments are visible on a page, embedded files are not.
+
+There exist several example scripts: `embedded-list.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/embedded-list.py>`_, `new-annots.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/new-annots.py>`_.
+
+Also look at the sections above and at chapter :ref:`Appendix 3`.
+
+----------
+
+.. index::
+   pair: delete;pages
+   pair: rearrange;pages
+
+How to Delete and Re-Arrange Pages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+With PyMuPDF you have all options to copy, move, delete or re-arrange the pages of a PDF. Intuitive methods exist that allow you to do this on a page-by-page level, like the :meth:`Document.copyPage` method.
+
+Or you alternatively prepare a complete new page layout in form of a Python sequence, that contains the page numbers you want, in the sequence you want, and as many times as you want each page. The following may illustrate what can be done with :meth:`Document.select`:
+
+*doc.select([1, 1, 1, 5, 4, 9, 9, 9, 0, 2, 2, 2])*
+
+Now let's prepare a PDF for double-sided printing (on a printer not directly supporting this):
+
+The number of pages is given by *len(doc)* (equal to *doc.pageCount*). The following lists represent the even and the odd page numbers, respectively:
+
+>>> p_even = [p in range(len(doc)) if p % 2 == 0]
+>>> p_odd  = [p in range(len(doc)) if p % 2 == 1]
+
+This snippet creates the respective sub documents which can then be used to print the document:
+
+>>> doc.select(p_even)  # only the even pages left over
+>>> doc.save("even.pdf")  # save the "even" PDF
+>>> doc.close()  # recycle the file
+>>> doc = fitz.open(doc.name)  # re-open
+>>> doc.select(p_odd)  # and do the same with the odd pages
+>>> doc.save("odd.pdf")
+
+For more information also have a look at this Wiki `article <https://github.com/pymupdf/PyMuPDF/wiki/Rearranging-Pages-of-a-PDF>`_.
+
+
+The following example will reverse the order of all pages (**extremely fast:** sub-second time for the 1310 pages of the :ref:`AdobeManual`):
+
+>>> lastPage = len(doc) - 1
+>>> for i in range(lastPage):
+        doc.movePage(lastPage, i)  # move current last page to the front
+
+This snippet duplicates the PDF with itself so that it will contain the pages *0, 1, ..., n, 0, 1, ..., n* **(extremely fast and without noticeably increasing the file size!)**:
+
+>>> pageCount = len(doc)
+>>> for i in range(pageCount):
+        doc.copyPage(i)  # copy this page to after last page
+
+----------
+
+How to Join PDFs
+~~~~~~~~~~~~~~~~~~
+It is easy to join PDFs with method :meth:`Document.insertPDF`. Given open PDF documents, you can copy page ranges from one to the other. You can select the point where the copied pages should be placed, you can revert the page sequence and also change page rotation. This Wiki `article <https://github.com/pymupdf/PyMuPDF/wiki/Inserting-Pages-from-other-PDFs>`_ contains a full description.
+
+The GUI script `PDFjoiner.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/PDFjoiner.py>`_ uses this method to join a list of files while also joining the respective table of contents segments. It looks like this:
+
+.. image:: images/img-pdfjoiner.jpg
+   :scale: 60
+
+----------
+
+How to Add Pages
+~~~~~~~~~~~~~~~~~~
+There two methods for adding new pages to a PDF: :meth:`Document.insertPage` and :meth:`Document.newPage` (and they share a common code base).
+
+**newPage**
+
+:meth:`Document.newPage` returns the created :ref:`Page` object. Here is the constructor showing defaults::
+
+ >>> doc = fitz.open(...)  # some new or existing PDF document
+ >>> page = doc.newPage(to = -1,  # insertion point: end of document
+                        width = 595,  # page dimension: A4 portrait
+                        height = 842)
+
+The above could also have been achieved with the short form *page = doc.newPage()*. The *to* parameter specifies the document's page number (0-based) **in front of which** to insert.
+
+To create a page in *landscape* format, just exchange the width and height values.
+
+Use this to create the page with another pre-defined paper format:
+
+>>> w, h = fitz.PaperSize("letter-l")  # 'Letter' landscape
+>>> page = doc.newPage(width = w, height = h)
+
+The convenience function :meth:`PaperSize` knows over 40 industry standard paper formats to choose from. To see them, inspect dictionary :attr:`paperSizes`. Pass the desired dictionary key to :meth:`PaperSize` to retrieve the paper dimensions. Upper and lower case is supported. If you append "-L" to the format name, the landscape version is returned.
+
+.. note:: Here is a 3-liner that creates a PDF with one empty page. Its file size is 470 bytes:
+
+   >>> doc = fitz.open()
+   >>> doc.newPage()
+   >>> doc.save("A4.pdf")
+
+
+**insertPage**
+
+:meth:`Document.insertPage` also inserts a new page and accepts the same parameters *to*, *width* and *height*. But it lets you also insert arbitrary text into the new page and returns the number of inserted lines::
+
+ >>> doc = fitz.open(...)  # some new or existing PDF document
+ >>> n = doc.insertPage(to = -1,  # default insertion point
+                        text = None,  # string or sequence of strings
+                        fontsize = 11,
+                        width = 595,
+                        height = 842,
+                        fontname = "Helvetica",  # default font
+                        fontfile = None,  # any font file name
+                        color = (0, 0, 0))  # text color (RGB)
+
+The text parameter can be a (sequence of) string (assuming UTF-8 encoding). Insertion will start at :ref:`Point` (50, 72), which is one inch below top of page and 50 points from the left. The number of inserted text lines is returned. See the method definiton for more details.
+
+----------
+
+How To Dynamically Clean Up Corrupt PDFs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This shows a potential use of PyMuPDF with another Python PDF library (the excellent pure Python package `pdfrw <https://pypi.python.org/pypi/pdfrw>`_ is used here as an example).
+
+If a clean, non-corrupt / decompressed PDF is needed, one could dynamically invoke PyMuPDF to recover from many problems like so::
+
+ import sys
+ from io import BytesIO
+ from pdfrw import PdfReader
+ import fitz
+
+ #---------------------------------------
+ # 'Tolerant' PDF reader
+ #---------------------------------------
+ def reader(fname, password = None):
+     idata = open(fname, "rb").read()  # read the PDF into memory and
+     ibuffer = BytesIO(idata)  # convert to stream
+     if password is None:
+         try:
+             return PdfReader(ibuffer)  # if this works: fine!
+         except:
+             pass
+
+     # either we need a password or it is a problem-PDF
+     # create a repaired / decompressed / decrypted version
+     doc = fitz.open("pdf", ibuffer)
+     if password is not None:  # decrypt if password provided
+         rc = doc.authenticate(password)
+         if not rc > 0:
+             raise ValueError("wrong password")
+     c = doc.write(garbage=3, deflate=True)
+     del doc  # close & delete doc
+     return PdfReader(BytesIO(c))  # let pdfrw retry
+ #---------------------------------------
+ # Main program
+ #---------------------------------------
+ pdf = reader("pymupdf.pdf", password = None) # inlude a password if necessary
+ print pdf.Info
+ # do further processing
+
+With the command line utility *pdftk* (`available <https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/>`_ for Windows only, but reported to also run under `Wine <https://www.winehq.org/>`_) a similar result can be achieved, see `here <http://www.overthere.co.uk/2013/07/22/improving-pypdf2-with-pdftk/>`_. However, you must invoke it as a separate process via *subprocess.Popen*, using stdin and stdout as communication vehicles.
+
+How to Split Single Pages
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This deals with splitting up pages of a PDF in arbitrary pieces. For example, you may have a PDF with *Letter* format pages which you want to print with a magnification factor of four: each page is split up in 4 pieces which each go to a separate PDF page in *Letter* format again::
+
+    """
+    Create a PDF copy with split-up pages (posterize)
+    ---------------------------------------------------
+    License: GNU GPL V3
+    (c) 2018 Jorj X. McKie
+
+    Usage
+    ------
+    python posterize.py input.pdf
+
+    Result
+    -------
+    A file "poster-input.pdf" with 4 output pages for every input page.
+
+    Notes
+    -----
+    (1) Output file is chosen to have page dimensions of 1/4 of input.
+
+    (2) Easily adapt the example to make n pages per input, or decide per each
+        input page or whatever.
+
+    Dependencies
+    ------------
+    PyMuPDF 1.12.2 or later
+    """
+    from __future__ import print_function
+    import fitz, sys
+    infile = sys.argv[1]  # input file name
+    src = fitz.open(infile)
+    doc = fitz.open()  # empty output PDF
+
+    for spage in src:  # for each page in input
+        r = spage.rect  # input page rectangle
+        d = fitz.Rect(spage.CropBoxPosition,  # CropBox displacement if not
+                      spage.CropBoxPosition)  # starting at (0, 0)
+        #--------------------------------------------------------------------------
+        # example: cut input page into 2 x 2 parts
+        #--------------------------------------------------------------------------
+        r1 = r * 0.5  # top left rect
+        r2 = r1 + (r1.width, 0, r1.width, 0)  # top right rect
+        r3 = r1 + (0, r1.height, 0, r1.height)  # bottom left rect
+        r4 = fitz.Rect(r1.br, r.br)  # bottom right rect
+        rect_list = [r1, r2, r3, r4]  # put them in a list
+
+        for rx in rect_list:  # run thru rect list
+            rx += d  # add the CropBox displacement
+            page = doc.newPage(-1,  # new output page with rx dimensions
+                               width = rx.width,
+                               height = rx.height)
+            page.showPDFpage(
+                    page.rect,  # fill all new page with the image
+                    src,  # input document
+                    spage.number,  # input page number
+                    clip = rx,  # which part to use of input page
+                )
+
+    # that's it, save output file
+    doc.save("poster-" + src.name,
+             garbage = 3,                       # eliminate duplicate objects
+             deflate = True)                    # compress stuff where possible
+
+
+This shows what happens to an input page:
+
+.. image:: images/img-posterize.png
+
+--------------------------
+
+How to Combine Single Pages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This deals with joining PDF pages to form a new PDF with pages each combining two or four original ones (also called "2-up", "4-up", etc.). This could be used to create booklets or thumbnail-like overviews::
+
+    '''
+    Copy an input PDF to output combining every 4 pages
+    ---------------------------------------------------
+    License: GNU GPL V3
+    (c) 2018 Jorj X. McKie
+
+    Usage
+    ------
+    python 4up.py input.pdf
+
+    Result
+    -------
+    A file "4up-input.pdf" with 1 output page for every 4 input pages.
+
+    Notes
+    -----
+    (1) Output file is chosen to have A4 portrait pages. Input pages are scaled
+        maintaining side proportions. Both can be changed, e.g. based on input
+        page size. However, note that not all pages need to have the same size, etc.
+
+    (2) Easily adapt the example to combine just 2 pages (like for a booklet) or
+        make the output page dimension dependent on input, or whatever.
+
+    Dependencies
+    -------------
+    PyMuPDF 1.12.1 or later
+    '''
+    from __future__ import print_function
+    import fitz, sys
+    infile = sys.argv[1]
+    src = fitz.open(infile)
+    doc = fitz.open()                      # empty output PDF
+
+    width, height = fitz.PaperSize("a4")   # A4 portrait output page format
+    r = fitz.Rect(0, 0, width, height)
+
+    # define the 4 rectangles per page
+    r1 = r * 0.5                           # top left rect
+    r2 = r1 + (r1.width, 0, r1.width, 0)   # top right
+    r3 = r1 + (0, r1.height, 0, r1.height) # bottom left
+    r4 = fitz.Rect(r1.br, r.br)            # bottom right
+
+    # put them in a list
+    r_tab = [r1, r2, r3, r4]
+
+    # now copy input pages to output
+    for spage in src:
+        if spage.number % 4 == 0:           # create new output page
+            page = doc.newPage(-1,
+                          width = width,
+                          height = height)
+        # insert input page into the correct rectangle
+        page.showPDFpage(r_tab[spage.number % 4],    # select output rect
+                         src,               # input document
+                         spage.number)      # input page number
+
+    # by all means, save new file using garbage collection and compression
+    doc.save("4up-" + infile, garbage = 3, deflate = True)
+
+Example effect:
+
+.. image:: images/img-4up.png
+
+
+--------------------------
+
+How to Convert Any Document to PDF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here is a script that converts any PyMuPDF supported document to a PDF. These include XPS, EPUB, FB2, CBZ and all image formats, including multi-page TIFF images.
+
+It features maintaining any metadata, table of contents and links contained in the source document::
+
+    from __future__ import print_function
+    """
+    Demo script: Convert input file to a PDF
+    -----------------------------------------
+    Intended for multi-page input files like XPS, EPUB etc.
+
+    Features:
+    ---------
+    Recovery of table of contents and links of input file.
+    While this works well for bookmarks (outlines, table of contents),
+    links will only work if they are not of type "LINK_NAMED".
+    This link type is skipped by the script.
+
+    For XPS and EPUB input, internal links however **are** of type "LINK_NAMED".
+    Base library MuPDF does not resolve them to page numbers.
+
+    So, for anyone expert enough to know the internal structure of these
+    document types, can further interpret and resolve these link types.
+
+    Dependencies
+    --------------
+    PyMuPDF v1.14.0+
+    """
+    import sys
+    import fitz
+    if not (list(map(int, fitz.VersionBind.split("."))) >= [1,14,0]):
+        raise SystemExit("need PyMuPDF v1.14.0+")
+    fn = sys.argv[1]
+
+    print("Converting '%s' to '%s.pdf'" % (fn, fn))
+
+    doc = fitz.open(fn)
+
+    b = doc.convertToPDF()                      # convert to pdf
+    pdf = fitz.open("pdf", b)                   # open as pdf
+
+    toc= doc.getToC()                           # table of contents of input
+    pdf.setToC(toc)                             # simply set it for output
+    meta = doc.metadata                         # read and set metadata
+    if not meta["producer"]:
+        meta["producer"] = "PyMuPDF v" + fitz.VersionBind
+
+    if not meta["creator"]:
+        meta["creator"] = "PyMuPDF PDF converter"
+    meta["modDate"] = fitz.getPDFnow()
+    meta["creationDate"] = meta["modDate"]
+    pdf.setMetadata(meta)
+
+    # now process the links
+    link_cnti = 0
+    link_skip = 0
+    for pinput in doc:                # iterate through input pages
+        links = pinput.getLinks()     # get list of links
+        link_cnti += len(links)       # count how many
+        pout = pdf[pinput.number]     # read corresp. output page
+        for l in links:               # iterate though the links
+            if l["kind"] == fitz.LINK_NAMED:    # we do not handle named links
+                print("named link page", pinput.number, l)
+                link_skip += 1        # count them
+                continue
+            pout.insertLink(l)        # simply output the others
+
+    # save the conversion result
+    pdf.save(fn + ".pdf", garbage=4, deflate=True)
+    # say how many named links we skipped
+    if link_cnti > 0:
+        print("Skipped %i named links of a total of %i in input." % (link_skip, link_cnti))
+
+--------------------------
+
+How to Deal with Messages Issued by MuPDF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Since PyMuPDF v1.16.0, **error messages** issued by the underlying MuPDF library are being redirected to the Python standard device *sys.stderr*. So you can handle them like any other output going to this devices.
+
+In addition, these messages go to the internal buffer together with any MuPDF warnings -- see below.
+
+We always prefix these messages with an identifying string *"mupdf:"*.
+If you prefer to not see recoverable MuPDF errors at all, issue the command ``fitz.TOOLS.mupdf_display_errors(False)``.
+
+MuPDF warnings continue to be stored in an internal buffer and can be viewed using :meth:`Tools.mupdf_warnings`.
+
+Please note that MuPDF errors may or may not lead to Python exceptions. In other words, you may see error messages from which MuPDF can recover and continue processing.
+
+Example output for a **recoverable error**. We are opening a damaged PDF, but MuPDF is able to repair it and gives us a few information on what happened. Then we illustrate how to find out whether the document can later be saved incrementally. Checking the :attr:`Document.isDirty` attribute at this point also indicates that the open had to repair the document:
+
+>>> import fitz
+>>> doc = fitz.open("damaged-file.pdf")  # leads to a sys.stderr message:
+mupdf: cannot find startxref
+>>> print(fitz.TOOLS.mupdf_warnings())  # check if there is more info:
+cannot find startxref
+trying to repair broken xref
+repairing PDF document
+object missing 'endobj' token
+>>> doc.can_save_incrementally()  # this is to be expected:
+False
+>>> # the following indicates whether there are updates so far
+>>> # this is the case because of the repair actions:
+>>> doc.isDirty
+True
+>>> # the document has nevertheless been created:
+>>> doc
+fitz.Document('damaged-file.pdf')
+>>> # we now know that any save must occur to a new file
+
+Example output for an **unrecoverable error**:
+
+>>> import fitz
+>>> doc = fitz.open("does-not-exist.pdf")
+mupdf: cannot open does-not-exist.pdf: No such file or directory
+Traceback (most recent call last):
+  File "<pyshell#1>", line 1, in <module>
+    doc = fitz.open("does-not-exist.pdf")
+  File "C:\Users\Jorj\AppData\Local\Programs\Python\Python37\lib\site-packages\fitz\fitz.py", line 2200, in __init__
+    _fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
+RuntimeError: cannot open does-not-exist.pdf: No such file or directory
+>>>
+
+--------------------------
+
+How to Deal with PDF Encryption
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Starting with version 1.16.0, PDF decryption and encryption (using passwords) are fully supported. You can do the following:
+
+* Check whether a document is password protected / (still) encrypted (:attr:`Document.needsPass`, :attr:`Document.isEncrypted`).
+
+* Gain access authorization to a document (:meth:`Document.authenticate`).
+
+* Set encryption details for PDF files using :meth:`Document.save` or :meth:`Document.write` and
+
+    - decrypt or encrypt the content
+    - set password(s)
+    - set the encryption method
+    - set permission details
+
+.. note:: A PDF document may have two different passwords:
+
+   * The **owner password** provides full access rights, including changing passwords, encryption method, or permission detail.
+   * The **user password** provides access to document content according to the established permission details. If present, opening the PDF in a viewer will require providing it.
+
+   Method :meth:`Document.authenticate` will automatically establish access rights according to the password used.
+
+The following snippet creates a new PDF and encrypts it with separate user and owner passwords. Permissions are granted to print, copy and annotate, but no changes are allowed to someone authenticating with the user password::
+
+    import fitz
+
+    text = "some secret information"  # keep this data secret
+    perm = int(
+        fitz.PDF_PERM_ACCESSIBILITY  # always use this
+        | fitz.PDF_PERM_PRINT  # permit printing
+        | fitz.PDF_PERM_COPY  # permit copying
+        | fitz.PDF_PERM_ANNOTATE  # permit annotations
+    )
+    owner_pass = "owner"  # owner password
+    user_pass = "user"  # user password
+    encrypt_meth = fitz.PDF_ENCRYPT_AES_256  # strongest algorithm
+    doc = fitz.open()  # empty pdf
+    page = doc.newPage()  # empty page
+    page.insertText((50, 72), text)  # insert the data
+    doc.save(
+        "secret.pdf",
+        encryption=encrypt_meth,  # set the encryption method
+        owner_pw=owner_pass,  # set the owner password
+        user_pw=user_pass,  # set the user password
+        permissions=perm,  # set permissions
+    )
+
+Opening this document with some viewer (Nitro Reader 5) reflects these settings:
+
+.. image:: images/img-encrypting.jpg
+   :scale: 50
+
+**Decrypting** will automatically happen on save as before when no encryption parameters are provided.
+
+To **keep the encryption method** of a PDF save it using *encryption=fitz.PDF_ENCRYPT_KEEP*. If *doc.can_save_incrementally() == True*, an incremental save is also possible.
+
+To **change the encryption method** specify the full range of options above (encryption, owner_pw, user_pw, permissions). An incremental save is **not possible** in this case.
+
+
+--------------------------
+
+Common Issues and their Solutions
+---------------------------------
+
+Changing Annotations: Unexpected Behaviour
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Problem
+^^^^^^^^^
+There are two scenarios:
+
+1. Updating an annotation, which has been created by some other software, via a PyMuPDF script.
+2. Creating an annotation with PyMuPDF and later changing it using some other PDF application.
+
+In both cases you may experience unintended changes like a different annotation icon or text font, the fill color or line dashing have disappeared, line end symbols have changed their size or even have disappeared too, etc.
+
+Cause
+^^^^^^
+Annotation maintenance is handled differently by each PDF maintenance application (if it is supported at all). For any given PDF application, some annotation types may not be supported at all or only partly, or some details may be handled in a different way than with another application.
+
+Almost always a PDF application also comes with its own icons (file attachments, sticky notes and stamps) and its own set of supported text fonts. For example:
+
+* (Py-) MuPDF only supports these 5 basic fonts for 'FreeText' annotations: Helvetica, Times-Roman, Courier, ZapfDingbats and Symbol -- no italics / no bold variations. When changing a 'FreeText' annotation created by some other app, its font will probably not be recognized nor accepted and be replaced by Helvetica.
+
+* PyMuPDF fully supports the PDF text markers, but these types cannot be updated with Adobe Acrobat Reader.
+
+In most cases there also exists limited support for line dashing which causes existing dashes to be replaced by straight lines. For example:
+
+* PyMuPDF fully supports all line dashing forms, while other viewers only accept a limited subset.
+
+
+Solutions
+^^^^^^^^^^
+Unfortunately there is not much you can do in most of these cases.
+
+1. Stay with the same software for **creating and changing** an annotation.
+2. When using PyMuPDF to change an "alien" annotation, try to **avoid** :meth:`Annot.update`. The following methods **can be used without it** so that the original appearance should be maintained:
+
+  * :meth:`Annot.setRect` (location changes)
+  * :meth:`Annot.setFlags` (annotation behaviour)
+  * :meth:`Annot.setInfo` (meta information, except changes to *content*)
+  * :meth:`Annot.fileUpd` (file attachment changes)
+
+Misplaced Item Insertions on PDF Pages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Problem
+^^^^^^^^^
+
+You inserted an item (like an image, an annotation or some text) on an existing PDF page, but later you find it being placed at a different location than intended. For example an image should be inserted at the top, but it unexpectedly appears near the bottom of the page.
+
+Cause
+^^^^^^
+
+The creator of the PDF has established a non-standard page geometry without keeping it "local" (as they should!). Most commonly, the PDF standard point (0,0) at *bottom-left* has been changed to the *top-left* point. So top and bottom are reversed -- causing your insertion to be misplaced.
+
+The visible image of a PDF page is controlled by commands coded in a special mini-language. For an overview of this language consult "Operator Summary" on pp. 985 of the :ref:`AdobeManual`. These commands are stored in :data:`contents` objects as strings (*bytes* in PyMuPDF).
+
+There are commands in that language, which change the coordinate system of the page for all the following commands. In order to limit the scope of such commands local, they must be wrapped by the command pair *q* ("save graphics state", or "stack") and *Q* ("restore graphics state", or "unstack").
+
+.. highlight:: text
+
+So the PDF creator did this::
+
+    stream
+    1 0 0 -1 0 792 cm    % <=== change of coordinate system:
+    ...                  % letter page, top / bottom reversed
+    ...                  % remains active beyond these lines
+    endstream
+
+where they should have done this::
+
+    stream
+    q                    % put the following in a stack
+    1 0 0 -1 0 792 cm    % <=== scope of this is limited by Q command
+    ...                  % here, a different geometry exists
+    Q                    % after this line, geometry of outer scope prevails
+    endstream
+
+.. note::
+
+   * In the mini-language's syntax, spaces and line breaks are equally accepted token delimiters.
+   * Multiple consecutive delimiters are treated as one.
+   * Keywords "stream" and "endstream" are inserted automatically -- not by the programmer.
+
+.. highlight:: python
+
+Solutions
+^^^^^^^^^^
+
+Since v1.16.0, there is the property :attr:`Page._isWrapped`, which lets you check whether a page's contents are wrapped in that string pair.
+
+If it is *False* or if you want to be on the safe side, pick one of the following:
+
+1. The easiest way: in your script, do a :meth:`Page._cleanContents` before you do your first item insertion.
+2. Pre-process your PDF with the MuPDF command line utility *mutool clean -c ...* and work with its output file instead.
+3. Directly wrap the page's :data:`contents` with the stacking commands before you do your first item insertion.
+
+**Solutions 1. and 2.** use the same technical basis and **do a lot more** than what is required in this context: they also clean up other inconsistencies or redundancies that may exist, multiple */Contents* objects will be concatenated into one, and much more.
+
+.. note:: For **incremental saves,** solution 1. has an unpleasant implication: it will bloat the update delta, because it changes so many things and, in addition, stores the **cleaned contents uncompressed**. So, if you use :meth:`Page._cleanContents` you should consider **saving to a new file** with (at least) *garbage=3* and *deflate=True*.
+
+**Solution 3.** is completely under your control and only does the minimum corrective action. There exists a handy low-level utility function which you can use for this. Suggested procedure:
+
+* **Prepend** the missing stacking command by executing *fitz.TOOLS._insert_contents(page, b"q\n", False)*.
+* **Append** an unstacking command by executing *fitz.TOOLS._insert_contents(page, b"\nQ", True)*.
+* Alternatively, just use :meth:`Page._wrapContents`, wich executes the previous two functions.
+
+.. note:: If small incremental update deltas are a concern, this approach is the most effective. Other contents objects are not touched. The utility method creates two new PDF :data:`stream` objects and inserts them before, resp. after the page's other :data:`contents`. We therefore recommend the following snippet to get this situation under control:
+
+    >>> if not page._isWrapped:
+            page._wrapContents()
+    >>> # start inserting text, images or annotations here
+
+--------------------------
+
+Low-Level Interfaces
+---------------------
+Numerous methods are available to access and manipulate PDF files on a fairly low level. Admittedly, a clear distinction between "low level" and "normal" functionality is not always possible or subject to personal taste.
+
+It also may happen, that functionality previously deemed low-level is lateron assessed as being part of the normal interface. This has happened in v1.14.0 for the class :ref:`Tools` -- you now find it as an item in the Classes chapter.
+
+Anyway -- it is a matter of documentation only: in which chapter of the documentation do you find what. Everything is available always and always via the same interface.
+
+----------------------------------
+
+How to Iterate through the :data:`xref` Table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A PDF's :data:`xref` table is a list of all objects defined in the file. This table may easily contain many thousand entries -- the manual :ref:`AdobeManual` for example has over 330'000 objects. Table entry "0" is reserved and must not be touched.
+The following script loops through the :data:`xref` table and prints each object's definition::
+
+    >>> xreflen = doc.xrefLength()  # length of objects table
+    >>> for xref in range(1, xreflen):  # skip item 0!
+            print("")
+            print("object %i (stream: %s)" % (xref, doc.isStream(xref)))
+            print(doc.xrefObject(i, compressed=False))
+
+
+.. highlight:: text
+
+This produces the following output::
+
+    object 1 (stream: False)
+    <<
+        /ModDate (D:20170314122233-04'00')
+        /PXCViewerInfo (PDF-XChange Viewer;2.5.312.1;Feb  9 2015;12:00:06;D:20170314122233-04'00')
+    >>
+
+    object 2 (stream: False)
+    <<
+        /Type /Catalog
+        /Pages 3 0 R
+    >>
+
+    object 3 (stream: False)
+    <<
+        /Kids [ 4 0 R 5 0 R ]
+        /Type /Pages
+        /Count 2
+    >>
+
+    object 4 (stream: False)
+    <<
+        /Type /Page
+        /Annots [ 6 0 R ]
+        /Parent 3 0 R
+        /Contents 7 0 R
+        /MediaBox [ 0 0 595 842 ]
+        /Resources 8 0 R
+    >>
+    ...
+    object 7 (stream: True)
+    <<
+        /Length 494
+        /Filter /FlateDecode
+    >>
+    ...
+
+.. highlight:: python
+
+A PDF object definition is an ordinary ASCII string.
+
+----------------------------------
+
+How to Handle Object Streams
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Some object types contain additional data apart from their object definition. Examples are images, fonts, embedded files or commands describing the appearance of a page.
+
+Objects of these types are called "stream objects". PyMuPDF allows reading an object's stream via method :meth:`Document.xrefStream` with the object's :data:`xref` as an argument. And it is also possible to write back a modified version of a stream using :meth:`Document.updatefStream`.
+
+Assume that the following snippet wants to read all streams of a PDF for whatever reason::
+
+    >>> xreflen = doc.xrefLength() # number of objects in file
+    >>> for xref in range(1, xreflen): # skip item 0!
+            stream = doc.xrefStream(xref)
+            # do something with it (it is a bytes object or None)
+            # e.g. just write it back:
+            if stream:
+                doc.updatefStream(xref, stream)
+
+:meth:`Document.xrefStream` automatically returns a stream decompressed as a bytes object -- and :meth:`Document.updatefStream` automatically compresses it (where beneficial).
+
+----------------------------------
+
+How to Handle Page Contents
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A PDF page can have one or more :data:`contents` objects -- in fact, a page will be empty if it has no such object. These are stream objects describing **what** appears **where** on a page (like text and images). They are written in a special mini-language desribed e.g. in chapter "APPENDIX A - Operator Summary" on page 985 of the :ref:`AdobeManual`.
+
+Every PDF reader application must be able to interpret the contents syntax to reproduce the intended appearance of the page.
+
+If multiple :data:`contents` objects are provided, they must be read and interpreted in the specified sequence in exactly the same way as if these streams were provided as a concatenation of the several.
+
+There are good technical arguments for having multiple :data:`contents` objects:
+
+* It is a lot easier and faster to just add new :data:`contents` objects than maintaining a single big one (which entails reading, decompressing, modifying, recompressing, and rewriting it for each change).
+* When working with incremental updates, a modified big :data:`contents` object will bloat the update delta and can thus easily negate the efficiency of incremental saves.
+
+For example, PyMuPDF adds new, small :data:`contents` objects in methods :meth:`Page.insertImage`, :meth:`Page.showPDFpage()` and the :ref:`Shape` methods.
+
+However, there are also situations when a **single** :data:`contents` object is beneficial: it is easier to interpret and better compressible than multiple smaller ones.
+
+Here are two ways of combining multiple contents of a page::
+
+    >>> # method 1: use the clean function
+    >>> for i in range(len(doc)):
+            doc[i]._cleanContents() # cleans and combines multiple Contents
+            page = doc[i]           # re-read the page (has only 1 contents now)
+            cont = page._getContents()[0]
+            # do something with the cleaned, combined contents
+
+    >>> # method 2: concatenate multiple contents yourself
+    >>> for page in doc:
+            cont = b""              # initialize contents
+            for xref in page._getContents(): # loop through content xrefs
+                cont += doc.xrefStream(xref)
+            # do something with the combined contents
+
+The clean function :meth:`Page._cleanContents` does a lot more than just glueing :data:`contents` objects: it also corrects and optimizes the PDF operator syntax of the page and removes any inconsistencies.
+
+----------------------------------
+
+How to Access the PDF Catalog
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This is a central ("root") object of a PDF. It serves as a starting point to reach important other objects and it also contains some global options for the PDF::
+
+    >>> import fitz
+    >>> doc=fitz.open("PyMuPDF.pdf")
+    >>> cat = doc._getPDFroot()            # get xref of the /Catalog
+    >>> print(doc.xrefObject(cat))     # print object definition
+    <<
+        /Type/Catalog                 % object type
+        /Pages 3593 0 R               % points to page tree
+        /OpenAction 225 0 R           % action to perform on open
+        /Names 3832 0 R               % points to global names tree
+        /PageMode /UseOutlines        % initially show the TOC
+        /PageLabels<</Nums[0<</S/D>>2<</S/r>>8<</S/D>>]>> % names given to pages
+        /Outlines 3835 0 R            % points to outline tree
+    >>
+
+.. note:: Indentation, line breaks and comments are inserted here for clarification purposes only and will not normally appear. For more information on the PDF catalog see section 3.6.1 on page 137 of the :ref:`AdobeManual`.
+
+----------------------------------
+
+How to Access the PDF File Trailer
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The trailer of a PDF file is a :data:`dictionary` located towards the end of the file. It contains special objects, and pointers to important other information. See :ref:`AdobeManual` p. 96. Here is an overview:
+
+======= =========== ===================================================================================
+**Key** **Type**    **Value**
+======= =========== ===================================================================================
+Size    int         Number of entries in the cross-reference table + 1.
+Prev    int         Offset to previous :data:`xref` section (indicates incremental updates).
+Root    dictionary  (indirect) Pointer to the catalog. See previous section.
+Encrypt dictionary  Pointer to encryption object (encrypted files only).
+Info    dictionary  (indirect) Pointer to information (metadata).
+ID      array       File identifier consisting of two byte strings.
+XRefStm int         Offset of a cross-reference stream. See :ref:`AdobeManual` p. 109.
+======= =========== ===================================================================================
+
+Access this information via PyMuPDF with :meth:`Document._getTrailerString`.
+
+    >>> import fitz
+    >>> doc=fitz.open("PyMuPDF.pdf")
+    >>> trailer=doc._getTrailerString()
+    >>> print(trailer)
+    <</Size 5535/Info 5275 0 R/Root 5274 0 R/ID[(\340\273fE\225^l\226\232O|\003\201\325g\245)(}#1,\317\205\000\371\251wO6\352Oa\021)]>>
+    >>>
+
+----------------------------------
+
+How to Access XML Metadata
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A PDF may contain XML metadata in addition to the standard metadata format. In fact, most PDF reader or modification software adds this type of information when being used to save a PDF (Adobe, Nitro PDF, PDF-XChange, etc.).
+
+PyMuPDF has no way to **interpret or change** this information directly, because it contains no XML features. The XML metadata is however stored as a :data:`stream` object, so we do provide a way to **read the XML** stream and, potentially, also write back a modified stream or even delete it::
+
+    >>> metaxref = doc._getXmlMetadataXref()           # get xref of XML metadata
+    >>> # check if metaxref > 0!!!
+    >>> doc.xrefObject(metaxref)                   # object definition
+    '<</Subtype/XML/Length 3801/Type/Metadata>>'
+    >>> xmlmetadata = doc.xrefStream(metaxref)     # XML data (stream - bytes obj)
+    >>> print(xmlmetadata.decode("utf8"))              # print str version of bytes
+    <?xpacket begin="\ufeff" id="W5M0MpCehiHzreSzNTczkc9d"?>
+    <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-702">
+    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+    ...
+    omitted data
+    ...
+    <?xpacket end="w"?>
+
+Using some XML package, the XML data can be interpreted and / or modified and then stored back::
+
+    >>> # write back modified XML metadata:
+    >>> doc.updatefStream(metaxref, xmlmetadata)
+    >>>
+    >>> # if these data are not wanted, delete them:
+    >>> doc._delXmlMetadata()
diff --git a/docs/font.rst b/docs/font.rst

new file mode 100644 (file)

index 0000000..519ce9b
--- /dev/null
+++ b/docs/font.rst
@@ -0,0 +1,141 @@
+.. _Font:
+
+================
+Font
+================
+
+*(New in v1.16.18)* This class represents a font as defined in MuPDF (*fz_font_s* structure). It is required for the new class :ref:`TextWriter` and the new :meth:`Page.writeText`. Currently, it has no connection to how fonts are used in methods ``insertText`` or insertTextbox``, respectively.
+
+A Font object also contains useful general information, like the font bbox, the number of defined glyphs, glyph names or the bbox of a single glyph.
+
+**Class API**
+
+.. class:: Font
+
+   .. method:: __init__(self, fontname=None, fontfile=None,
+                  fontbuffer=None, script=0, language=None, ordering=-1, is_bold=0,
+                  is_italic=0, is_serif=0)
+
+      Font constructor. The large number of parameters are used to locate font, which most closely resembles the requirements. Not all parameters are ever required -- see the below pseudo code explaining the logic how the parameters are evaluated.
+
+      :arg str fontname: one of the :ref:`Base-14-Fonts` or CJK fontnames. Also possible are a select few of other names like (watch the correct spelling): "Arial", "Times", "Times Roman".
+      
+         *(Changed in v1.17.4)*
+
+         If you have installed `pymupdf-fonts <https://pypi.org/project/pymupdf-fonts/>`_, you can also use the following new "reserved" fontnames: "figo", "figbo", "figit", "figbi", "fimo", and "fimbo". This will provide one of the "FiraGo" or resp. "FiraMono" fonts, created by Mozilla.org.
+
+      :arg str filename: the filename of a fontfile somewhere on your system [#f1]_.
+      :arg bytes,bytearray,io.BytesIO fontbuffer: a fontfile loaded in memory [#f1]_.
+      :arg in script: the number of a UCDN script. Currently supported in PyMuPDF are numbers 24, and 32 through 35.
+      :arg str language: one of the values "zh-Hant" (traditional Chinese), "zh-Hans" (simplified Chinese), "ja" (Japanese) and "ko" (Korean). Otherwise, all ISO 639 codes from the subsets 1, 2, 3 and 5 are also possible, but are currently documentary only.
+      :arg int ordering: an alternative selector for one of the CJK fonts.
+      :arg bool is_bold: look for a bold font.
+      :arg bool is_italic: look for an italic font.
+      :arg bool is_serif: look for a serifed font.
+
+      :returns: a MuPDF font if successful. This is the overall logic, how an appropriate font is located::
+
+         if fontfile:
+            create font from it ignoring other arguments
+            if not successful -> exception
+         if fonbuffer:
+            create font from it ignoring other arguments
+            if not successful -> exception
+         if ordering >= 0:
+            load **"universal"** font ignoring other parameters
+            # this will always be successful
+         if fontname:
+            create a Base14 font, or resp. **"universal"** font, ignoring other parameters
+            # note: values "Arial", "Times", "Times Roman" are also possible
+            if not successful -> exception
+         Finally try to load a "NOTO" font using *script* and *language* parameters.
+         if not successful:
+            look for fallback font
+
+      .. note::
+
+        With the usual abbreviations "helv", "tiro", etc., you will create fonts with the expected names "Helvetica", "Times-Roman" and so on.
+
+        Using *ordering >= 0*, or fontnames starting with "china", "japan" or "korea" will always create the same **"universal"** font **"Droid Sans Fallback Regular"**. This font supports **all CJK and all Latin characters**.
+
+        Actually, you would rarely ever need another font than **"Droid Sans Fallback Regular"**. **Except** that this font file is relatively large and adds about 1.65 MB (compressed) to your PDF file size. If you do not need CJK support, stick with specifying "helv", "tiro" etc., and you will get away with about 35 KB compressed.
+
+        If you **know** you have a mixture of CJK and Latin text, consider just using ``Font(ordering=0)`` because this supports everything and also significantly (by a factor of two to three) speeds up execution: MuPDF will always find any character in this single font and need not check fallbacks.
+
+        But if you do specify a Base-14 fontname, you will still be able to also write CJK characters! MuPDF automatically detects this situation and silently falls back to the universal font (which will then of course also be embedded in your PDF).
+
+        *(New in v1.17.4)* Optionally, a set of new "reserved" fontnames becomes available if you install `pymupdf-fonts <https://pypi.org/project/pymupdf-fonts/>`_. The currently available fonts are from the Fira fonts family created by Mozilla. "Fira Mono" is a nice mono-spaced sans font set and FiraGO is another non-serifed "universal" font, set which supports all European languages (including Cyrillic and Greek) plus Thai, Arabian, Hewbrew and Devanagari -- however none of the CJK languages. The size of a FiraGO font is only a quarter of the "Droid Sans Fallback" size (compressed 400 KB vs. 1.65 MB) -- and the style variants bold and italic are available..The following table maps a fontname to the corresponding font:
+
+            =========== =======================================
+            Fontname    Font
+            =========== =======================================
+            figo        FiraGO Regular
+            figbo       FiraGO Bold
+            figit       FiraGO Italic
+            figbi       FiraGO Bold Italic
+            fimo        Fira Mono Regular
+            fimbo       Fira Mono Bold
+            =========== =======================================
+
+        **All fonts mentioned here** also support Greek and Cyrillic letters.
+
+   .. method:: has_glyph(chr, language=None, script=0)
+
+      Check whether the unicode *chr* exists in the font or some fallback. May be used to check whether any "TOFU" symbols will appear on output.
+
+      :arg int chr: the unicode of the character (i.e. *ord()*).
+      :arg str language: the language -- currently unused.
+      :arg int script: the UCDN script number.
+      :returns: *True* or *False*.
+
+   .. method:: glyph_advance(chr, language=None, script=0, wmode=0)
+
+      Calculate the "width" of the character's glyph (visual representation).
+
+      :arg int chr: the unicode number of the character. Use ``ord(c)``, not the character itself. Again, this should normally work even if a character is not supported by that font, because fallback fonts will be checked where necessary.
+
+      The other parameters are not in use currently. This especially means that only horizontal text writing is supported.
+
+      :returns: a float representing the glyph's width relative to **fontsize 1**.
+
+   .. method:: glyph_name_to_unicode(name)
+
+      Return the unicode for a given glyph name. Use it in conjunction with ``chr()`` if you want to output e.g. a certain symbol.
+
+      :arg str name: The name of the glyph.
+
+      :returns: The unicode integer, or 65533 = 0xFFFD if the name is unknown. Examples: ``font.glyph_name_to_unicode("Sigma") = 931``, ``font.glyph_name_to_unicode("sigma") = 963``. Refer to e.g. `this <https://github.com/adobe-type-tools/agl-aglfn/blob/master/glyphlist.txt>`_ publication for a list of glyph names and their unicode numbers.
+
+   .. method:: unicode_to_glyph_name(chr, language=None, script=0, wmode=0)
+
+      Show the name of the character's glyph.
+
+      :arg int chr: the unicode number of the character. Use ``ord(c)``, not the character itself.
+
+      :returns: a string representing the glyph's name. E.g. ``font.glyph_name(ord("#")) = "numbersign"``. Depending on how this font was built, the string may be empty, ".notfound" or some generated name.
+
+   .. method:: text_length(text, fontsize=11)
+
+      Calculate the length of a unicode string.
+
+      :arg str text: a text string -- UTF-8 encoded. For Python 2, you must use unicode here.
+
+      :arg float fontsize: the fontsize.
+
+      :returns: a float representing the length of the string when stored in the PDF. Internally :meth:`glyph_advance` is used on a by-character level. If the font does not have a character, it will automatically be looked up in a fallback font.
+
+   .. attribute:: flags
+
+      A dictionary with various font properties, each represented as bools.
+
+   .. attribute:: name
+
+      Name of the font. May be "" or "(null)".
+
+   .. attribute:: glyph_count
+
+      The number of glyphs defined in the font.
+
+.. rubric:: Footnotes
+
+.. [#f1] MuPDF does not support all fontfiles with this feature and will raise exceptions like *"mupdf: FT_New_Memory_Face((null)): unknown file format"*, if encounters issues.
diff --git a/docs/functions.rst b/docs/functions.rst

new file mode 100644 (file)

index 0000000..10453e4
--- /dev/null
+++ b/docs/functions.rst
@@ -0,0 +1,704 @@
+============
+Functions
+============
+The following are miscellaneous functions and attributes on a fairly low-level technical detail.
+
+Some functions provide detail access to PDF structures. Others are stripped-down, high performance versions of other functions which provide more information.
+
+Yet others are handy, general-purpose utilities.
+
+
+==================================== ==============================================================
+**Function**                         **Short Description**
+==================================== ==============================================================
+:meth:`Annot._cleanContents`         PDF only: clean the annot's :data:`contents` objects
+:meth:`Annot.setAPNMatrix`           PDF only: set the matrix of the appearance object
+:meth:`Annot.setAPNMatrix`           PDF only: set the matrix of the appearance object
+:attr:`Annot.APNMattrix`             PDF only: the matrix of the appearance object
+:attr:`Annot.APNBBox`                PDF only: bbox of the appearance object
+:meth:`ConversionHeader`             return header string for *getText* methods
+:meth:`ConversionTrailer`            return trailer string for *getText* methods
+:meth:`Document._delXmlMetadata`     PDF only: remove XML metadata
+:meth:`Document._deleteObject`       PDF only: delete an object
+:meth:`Document._getNewXref`         PDF only: create and return a new :data:`xref` entry
+:meth:`Document._getOLRootNumber`    PDF only: return / create :data:`xref` of */Outline*
+:meth:`Document._getPDFroot`         PDF only: return the :data:`xref` of the catalog
+:meth:`Document._getPageObjNumber`   PDF only: return :data:`xref` and generation number of a page
+:meth:`Document._getPageXref`        PDF only: same as *_getPageObjNumber()*
+:meth:`Document._getTrailerString`   PDF only: return the PDF file trailer string
+:meth:`Document._getXmlMetadataXref` PDF only: return XML metadata :data:`xref` number
+:meth:`Document._getXrefLength`      PDF only: return length of :data:`xref` table
+:meth:`Document._getXrefStream`      PDF only: return content of a stream object
+:meth:`Document._getXrefString`      PDF only: return object definition "source"
+:meth:`Document._make_page_map`      PDF only: create a fast-access array of page numbers
+:meth:`Document._updateObject`       PDF only: insert or update a PDF object
+:meth:`Document._updateStream`       PDF only: replace the stream of an object
+:meth:`Document.extractFont`         PDF only: extract embedded font
+:meth:`Document.extractImage`        PDF only: extract embedded image
+:meth:`Document.getCharWidths`       PDF only: return a list of glyph widths of a font
+:meth:`Document.isStream`            PDF only: check whether an :data:`xref` is a stream object
+:attr:`Document.FontInfos`           PDF only: information on inserted fonts
+:meth:`ImageProperties`              return a dictionary of basic image properties
+:meth:`getPDFnow`                    return the current timestamp in PDF format
+:meth:`getPDFstr`                    return PDF-compatible string
+:meth:`getTextlength`                return string length for a given font & fontsize
+:meth:`Page.cleanContents`           PDF only: clean the page's :data:`contents` objects
+:meth:`Page._getContents`            PDF only: return a list of content numbers
+:meth:`Page._setContents`            PDF only: set page's :data:`contents` to some :data:`xref`
+:meth:`Page.getDisplayList`          create the page's display list
+:meth:`Page.getTextBlocks`           extract text blocks as a Python list
+:meth:`Page.getTextWords`            extract text words as a Python list
+:meth:`Page.run`                     run a page through a device
+:meth:`Page.readContents`            PDF only: get complete, concatenated /Contents source
+:meth:`Page.wrapContents`            wrap contents with stacking commands
+:attr:`Page._isWrapped`              check whether contents wrapping is present
+:meth:`planishLine`                  matrix to map a line to the x-axis
+:meth:`PaperSize`                    return width, height for a known paper format
+:meth:`PaperRect`                    return rectangle for a known paper format
+:meth:`sRGB_to_pdf`                  return PDF RGB color tuple from a sRGB integer
+:meth:`sRGB_to_rgb`                  return (R, G, B) color tuple from a sRGB integer
+:meth:`make_table`                   return list of table cells for a given rectangle
+:attr:`paperSizes`                   dictionary of pre-defined paper formats
+==================================== ==============================================================
+
+   .. method:: PaperSize(s)
+
+      Convenience function to return width and height of a known paper format code. These values are given in pixels for the standard resolution 72 pixels = 1 inch.
+
+      Currently defined formats include **'A0'** through **'A10'**, **'B0'** through **'B10'**, **'C0'** through **'C10'**, **'Card-4x6'**, **'Card-5x7'**, **'Commercial'**, **'Executive'**, **'Invoice'**, **'Ledger'**, **'Legal'**, **'Legal-13'**, **'Letter'**, **'Monarch'** and **'Tabloid-Extra'**, each in either portrait or landscape format.
+
+      A format name must be supplied as a string (case **in** \sensitive), optionally suffixed with "-L" (landscape) or "-P" (portrait). No suffix defaults to portrait.
+
+      :arg str s: any format name from above (upper or lower case), like *"A4"* or *"letter-l"*.
+
+      :rtype: tuple
+      :returns: *(width, height)* of the paper format. For an unknown format *(-1, -1)* is returned. Esamples: *fitz.PaperSize("A4")* returns *(595, 842)* and *fitz.PaperSize("letter-l")* delivers *(792, 612)*.
+
+-----
+
+   .. method:: PaperRect(s)
+
+      Convenience function to return a :ref:`Rect` for a known paper format.
+
+      :arg str s: any format name supported by :meth:`PaperSize`.
+
+      :rtype: :ref:`Rect`
+      :returns: *fitz.Rect(0, 0, width, height)* with *width, height=fitz.PaperSize(s)*.
+
+      >>> import fitz
+      >>> fitz.PaperRect("letter-l")
+      fitz.Rect(0.0, 0.0, 792.0, 612.0)
+      >>>
+
+-----
+
+   .. method:: sRGB_to_pdf(srgb)
+
+      *New in v1.17.4*
+
+      Convenience function returning a PDF color triple (red, green, blue) for a given sRGB color integer as it occurs in :meth:`Page.getText` dictionaries "dict" and "rawdict".
+
+      :arg int srgb: an integer of format RRGGBB, where each color component is an integer in range(255).
+
+      :returns: a tuple (red, green, blue) with float items in intervall *0 <= item <= 1* representing the same color.
+
+-----
+
+   .. method:: sRGB_to_rgb(srgb)
+
+      *New in v1.17.4*
+
+      Convenience function returning a color (red, green, blue) for a given sRGB color integer .
+
+      :arg int srgb: an integer of format RRGGBB, where each color component is an integer in range(255).
+
+      :returns: a tuple (red, green, blue) with integer items in intervall *0 <= item <= 255* representing the same color.
+
+-----
+
+   .. method:: make_table(rect=(0, 0, 1, 1), cols=1, rows=1)
+
+      *New in v1.17.4*
+
+      Convenience function returning a list of <rows x cols> :ref:`Rect` objects representing equal sized table cells for the given rectangle.
+
+      :arg rect_like rect: the rectangle to contain the table.
+      :arg int cols: the desired number of columns.
+      :arg int rows: the desired number of rows.
+      :returns: a list of :ref:`Rect` objects of equal size, whose union equals *rect*::
+
+         [
+            [cell00, cell01, ...]  # row 0
+            ...
+            [...]  # last row
+         ]
+
+-----
+
+   .. method:: planishLine(p1, p2)
+
+      *(New in version 1.16.2)*
+      
+      Return a matrix which maps the line from p1 to p2 to the x-axis such that p1 will become (0,0) and p2 a point with the same distance to (0,0).
+
+      :arg point_like p1: starting point of the line.
+      :arg point_like p2: end point of the line.
+
+      :rtype: :ref:`Matrix`
+      :returns:
+         
+         a matrix which combines a rotation and a translation::
+
+            p1 = fitz.Point(1, 1)
+            p2 = fitz.Point(4, 5)
+            abs(p2 - p1)  # distance of points
+            5.0
+            m = fitz.planishLine(p1, p2)
+            p1 * m
+            Point(0.0, 0.0)
+            p2 * m
+            Point(5.0, -5.960464477539063e-08)
+            # distance of the resulting points
+            abs(p2 * m - p1 * m)
+            5.0
+ 
+
+         .. image:: images/img-planish.png
+            :scale: 40
+
+
+-----
+
+   .. attribute:: paperSizes
+
+      A dictionary of pre-defines paper formats. Used as basis for :meth:`PaperSize`.
+
+-----
+
+   .. method:: getPDFnow()
+
+      Convenience function to return the current local timestamp in PDF compatible format, e.g. *D:20170501121525-04'00'* for local datetime May 1, 2017, 12:15:25 in a timezone 4 hours westward of the UTC meridian.
+
+      :rtype: str
+      :returns: current local PDF timestamp.
+
+-----
+
+   .. method:: getTextlength(text, fontname="helv", fontsize=11, encoding=TEXT_ENCODING_LATIN)
+
+      *(New in version 1.14.7)*
+      
+      Calculate the length of text on output with a given **builtin** font, fontsize and encoding.
+
+      :arg str text: the text string.
+      :arg str fontname: the fontname. Must be one of either the :ref:`Base-14-Fonts` or the CJK fonts, identified by their "reserved" fontnames (see table in :meth.`Page.insertFont`).
+      :arg float fontsize: size of the font.
+      :arg int encoding: the encoding to use. Besides 0 = Latin, 1 = Greek and 2 = Cyrillic (Russian) are available. Relevant for Base-14 fonts "Helvetica", "Courier" and "Times" and their variants only. Make sure to use the same value as in the corresponding text insertion.
+      :rtype: float
+      :returns: the length in points the string will have (e.g. when used in :meth:`Page.insertText`).
+
+      .. note:: This function will only do the calculation -- it won't insert font or text.
+
+      .. warning:: If you use this function to determine the required rectangle width for the (:ref:`Page` or :ref:`Shape`) *insertTextbox* methods, be aware that they calculate on a **by-character level**. Because of rounding effects, this will mostly lead to a slightly larger number: *sum([fitz.getTextlength(c) for c in text]) > fitz.getTextlength(text)*. So either (1) do the same, or (2) use something like *fitz.getTextlength(text + "'")* for your calculation.
+
+-----
+
+   .. method:: getPDFstr(text)
+
+      Make a PDF-compatible string: if the text contains code points *ord(c) > 255*, then it will be converted to UTF-16BE with BOM as a hexadecimal character string enclosed in "<>" brackets like *<feff...>*. Otherwise, it will return the string enclosed in (round) brackets, replacing any characters outside the ASCII range with some special code. Also, every "(", ")" or backslash is escaped with an additional backslash.
+
+      :arg str text: the object to convert
+
+      :rtype: str
+      :returns: PDF-compatible string enclosed in either *()* or *<>*.
+
+-----
+
+   .. method:: ImageProperties(stream)
+
+      *(New in version 1.14.14)*
+
+      Return a number of basic properties for an image.
+
+      :arg bytes|bytearray|BytesIO|file stream: an image either in memory or an **opened** file. A memory resident image maybe any of the formats *bytes*, *bytearray* or *io.BytesIO*.
+
+      :returns: a dictionary with the following keys (an empty dictionary for any error):
+
+         ========== ====================================================
+         **Key**    **Value**
+         ========== ====================================================
+         width      (int) width in pixels
+         height     (int) height in pixels
+         colorspace (int) colorspace.n (e.g. 3 = RGB)
+         bpc        (int) bits per component (usually 8)
+         format     (int) image format in *range(15)*
+         ext        (str) image file extension indicating the format
+         size       (int) length of the image in bytes
+         ========== ====================================================
+
+      Example:
+
+      >>> fitz.ImageProperties(open("img-clip.jpg","rb"))
+      {'bpc': 8, 'format': 9, 'colorspace': 3, 'height': 325, 'width': 244, 'ext': 'jpeg', 'size': 14161}
+      >>>
+
+
+-----
+
+   .. method:: ConversionHeader("text", filename="UNKNOWN")
+
+      Return the header string required to make a valid document out of page text outputs.
+
+      :arg str output: type of document. Use the same as the output parameter of *getText()*.
+
+      :arg str filename: optional arbitrary name to use in output types "json" and "xml".
+
+      :rtype: str
+
+-----
+
+   .. method:: ConversionTrailer(output)
+
+      Return the trailer string required to make a valid document out of page text outputs. See :meth:`Page.getText` for an example.
+
+      :arg str output: type of document. Use the same as the output parameter of *getText()*.
+
+      :rtype: str
+
+-----
+
+   .. method:: Document._deleteObject(xref)
+
+      PDF only: Delete an object given by its cross reference number.
+
+      :arg int xref: the cross reference number. Must be within the document's valid :data:`xref` range.
+
+      .. warning:: Only use with extreme care: this may make the PDF unreadable.
+
+-----
+
+   .. method:: Document._delXmlMetadata()
+
+      Delete an object containing XML-based metadata from the PDF. (Py-) MuPDF does not support XML-based metadata. Use this if you want to make sure that the conventional metadata dictionary will be used exclusively. Many thirdparty PDF programs insert their own metadata in XML format and thus may override what you store in the conventional dictionary. This method deletes any such reference, and the corresponding PDF object will be deleted during next garbage collection of the file.
+
+-----
+
+   .. method:: Document._getTrailerString(compressed=False)
+
+      *(New in version 1.14.9)*
+      
+      Return the trailer of the PDF (UTF-8), which is usually located at the PDF file's end. If not a PDF or the PDF has no trailer (because of irrecoverable errors), *None* is returned.
+
+      :arg bool compressed: *(ew in version 1.14.14)* whether to generate a compressed output or one with nice indentations to ease reading (default).
+
+      :returns: a string with the PDF trailer information. This is the analogous method to :meth:`Document._getXrefString` except that the trailer has no identifying :data:`xref` number. As can be seen here, the trailer object points to other important objects:
+
+      >>> doc=fitz.open("adobe.pdf")
+      >>> # compressed output
+      >>> print(doc._getTrailerString(True))
+      <</Size 334093/Prev 25807185/XRefStm 186352/Root 333277 0 R/Info 109959 0 R
+      /ID[(\\227\\366/gx\\016ds\\244\\207\\326\\261\\\\\\305\\376u)
+      (H\\323\\177\\346\\371pkF\\243\\262\\375\\346\\325\\002)]>>
+      >>> # non-compressed otput:
+      >>> print(doc._getTrailerString(False))
+      <<
+         /Size 334093
+         /Prev 25807185
+         /XRefStm 186352
+         /Root 333277 0 R
+         /Info 109959 0 R
+         /ID [ (\227\366/gx\016ds\244\207\326\261\\\305\376u) (H\323\177\346\371pkF\243\262\375\346\325\002) ]
+      >>
+
+      .. note:: MuPDF is capable of recovering from a number of damages a PDF may have. This includes re-generating a trailer, where the end of a file has been lost (e.g. because of incomplete downloads). If however *None* is returned for a PDF, then the recovery mechanisms were unsuccessful and you should check for any error messages (:attr:`Document.openErrCode`, :attr:`Document.openErrMsg`, :attr:`Tools.fitz_stderr`).
+
+
+-----
+
+   .. method:: Document._make_page_map()
+
+      Create an internal array of page numbers, which significantly speeds up page lookup (:meth:`Document.loadPage`). If this array exists, finding a page object will be up to two times faster. Functions which change the PDF's page layout (copy, delete, move, select pages) will destroy this array again.
+
+-----
+
+   .. method:: Document._getXmlMetadataXref()
+
+      Return the XML-based metadata :data:`xref` of the PDF if present -- also refer to :meth:`Document._delXmlMetadata`. You can use it to retrieve the content via :meth:`Document._getXrefStream` and then work with it using some XML software.
+
+      :rtype: int
+      :returns: :data:`xref` of PDF file level XML metadata.
+
+-----
+
+   .. method:: Document._getPageObjNumber(pno)
+
+      or
+
+   .. method:: Document._getPageXref(pno)
+
+       Return the :data:`xref` and generation number for a given page.
+
+      :arg int pno: Page number (zero-based).
+
+      :rtype: list
+      :returns: :data:`xref` and generation number of page *pno* as a list *[xref, gen]*.
+
+-----
+
+   .. method:: Document._getPDFroot()
+
+       Return the :data:`xref` of the PDF catalog.
+
+      :rtype: int
+      :returns: :data:`xref` of the PDF catalog -- a central :data:`dictionary` pointing to many other PDF information.
+
+-----
+
+   .. method:: Page.run(dev, transform)
+
+      Run a page through a device.
+
+      :arg dev: Device, obtained from one of the :ref:`Device` constructors.
+      :type dev: :ref:`Device`
+
+      :arg transform: Transformation to apply to the page. Set it to :ref:`Identity` if no transformation is desired.
+      :type transform: :ref:`Matrix`
+
+-----
+
+   .. method:: Page.wrapContents
+
+      Put string pair "q" / "Q" before, resp. after a page's */Contents* object(s) to ensure that any "geometry" changes are **local** only.
+
+      Use this method as an alternative, minimalistic version of :meth:`Page.cleanContents`. Its advantage is a small footprint in terms of processing time and impact on incremental saves.
+
+-----
+
+   .. attribute:: Page._isWrapped
+
+      Indicate whether :meth:`Page.wrapContents` may be required for object insertions in standard PDF geometry. Please note that this is a quick, basic check only: a value of *False* may still be a false alarm.
+
+-----
+
+   .. method:: Page.getTextBlocks(flags=None)
+
+      Deprecated wrapper for :meth:`TextPage.extractBLOCKS`.
+
+-----
+
+   .. method:: Page.getTextWords(flags=None)
+
+      Deprecated wrapper for :meth:`TextPage.extractWORDS`.
+
+-----
+
+   .. method:: Page.getDisplayList()
+
+      Run a page through a list device and return its display list.
+
+      :rtype: :ref:`DisplayList`
+      :returns: the display list of the page.
+
+-----
+
+   .. method:: Page._getContents()
+
+      Return a list of :data:`xref` numbers of :data:`contents` objects belonging to the page.
+
+      :rtype: list
+      :returns: a list of :data:`xref` integers.
+
+      Each page may have zero to many associated contents objects (:data:`stream` \s) which contain some operator syntax describing what appears where and how on the page (like text or images, etc. See the :ref:`AdobeManual`, chapter "Operator Summary", page 985). This function only enumerates the number(s) of such objects. To get the actual stream source, use function :meth:`Document._getXrefStream` with one of the numbers in this list. Use :meth:`Document._updateStream` to replace the content.
+
+-----
+
+   .. method:: Page._setContents(xref)
+
+      PDF only: Set a given object (identified by its :data:`xref`) as the page's one and only :data:`contents` object. Useful for joining mutiple :data:`contents` objects as in the following snippet::
+
+         >>> c = b""
+         >>> xreflist = page._getContents()
+         >>> for xref in xreflist:
+                 c += doc._getXrefStream(xref)
+         >>> doc._updateStream(xreflist[0], c)
+         >>> page._setContents(xreflist[0])
+         >>> # doc.save(..., garbage=1) will remove the unused objects
+
+      :arg int xref: the cross reference number of a :data:`contents` object. An exception is raised if outside the valid :data:`xref` range or not a stream object.
+
+-----
+
+   .. method:: Page.cleanContents()
+
+      Clean and concatenate all :data:`contents` objects associated with this page. "Cleaning" includes syntactical corrections, standardizations and "pretty printing" of the contents stream. Discrepancies between :data:`contents` and :data:`resources` objects will also be corrected. See :meth:`Page._getContents` for more details.
+
+      Changed in version 1.16.0 Annotations are no longer implicitely cleaned by this method. Use :meth:`Annot._cleanContents` separately.
+
+      .. warning:: This is a complex function which may generate large amounts of new data and render other data unused. It is **not recommended** using it together with the **incremental save** option. Also note that the resulting singleton new */Contents* object is **uncompressed**. So you should save to a **new file** using options *"deflate=True, garbage=3"*.
+
+-----
+
+   .. method:: Page.readContents()
+
+      *New in version 1.17.0.*
+      Return the concatenation of all :data:`contents` objects associated with the page -- without cleaning or otherwise modifying them. Use this method whenever you need to parse this source in its entirety whithout having to bother how many separate contents objects exist.
+
+
+-----
+
+   .. method:: Annot._cleanContents()
+
+      Clean the :data:`contents` streams associated with the annotation. This is the same type of action which :meth:`Page._cleanContents` performs -- just restricted to this annotation.
+
+
+-----
+
+   .. method:: Document.getCharWidths(xref=0, limit=256)
+
+      Return a list of character glyphs and their widths for a font that is present in the document. A font must be specified by its PDF cross reference number :data:`xref`. This function is called automatically from :meth:`Page.insertText` and :meth:`Page.insertTextbox`. So you should rarely need to do this yourself.
+
+      :arg int xref: cross reference number of a font embedded in the PDF. To find a font :data:`xref`, use e.g. *doc.getPageFontList(pno)* of page number *pno* and take the first entry of one of the returned list entries.
+
+      :arg int limit: limits the number of returned entries. The default of 256 is enforced for all fonts that only support 1-byte characters, so-called "simple fonts" (checked by this method). All :ref:`Base-14-Fonts` are simple fonts.
+
+      :rtype: list
+      :returns: a list of *limit* tuples. Each character *c* has an entry  *(g, w)* in this list with an index of *ord(c)*. Entry *g* (integer) of the tuple is the glyph id of the character, and float *w* is its normalized width. The actual width for some fontsize can be calculated as *w * fontsize*. For simple fonts, the *g* entry can always be safely ignored. In all other cases *g* is the basis for graphically representing *c*.
+
+      This function calculates the pixel width of a string called *text*::
+
+       def pixlen(text, widthlist, fontsize):
+       try:
+           return sum([widthlist[ord(c)] for c in text]) * fontsize
+       except IndexError:
+           m = max([ord(c) for c in text])
+           raise ValueError:("max. code point found: %i, increase limit" % m)
+
+-----
+
+   .. method:: Document._getXrefString(xref, compressed=False)
+
+      Return the string ("source code") representing an arbitrary object. For :data:`stream` objects, only the non-stream part is returned. To get the stream data, use :meth:`_getXrefStream`.
+
+      :arg int xref: :data:`xref` number.
+      :arg bool compressed: *(new in version 1.14.14)* whether to generate a compressed output or one with nice indentations to ease reading or parsing (default).
+
+      :rtype: string
+      :returns: the string defining the object identified by :data:`xref`. Example:
+
+      >>> doc = fitz.open("Adobe PDF Reference 1-7.pdf")  # the PDF
+      >>> page = doc[100]  # some page in it
+      >>> print(doc._getXrefString(page.xref, compressed=True))
+      <</CropBox[0 0 531 666]/Annots[4795 0 R 4794 0 R 4793 0 R 4792 0 R 4797 0 R 4796 0 R]
+      /Parent 109820 0 R/StructParents 941/Contents 229 0 R/Rotate 0/MediaBox[0 0 531 666]
+      /Resources<</Font<</T1_0 3914 0 R/T1_1 3912 0 R/T1_2 3957 0 R/T1_3 3913 0 R/T1_4 4576 0 R
+      /T1_5 3931 0 R/T1_6 3944 0 R>>/ProcSet[/PDF/Text]/ExtGState<</GS0 333283 0 R>>>>
+      /Type/Page>>
+      >>> print(doc._getXrefString(page.xref, compressed=False))
+      <<
+         /CropBox [ 0 0 531 666 ]
+         /Annots [ 4795 0 R 4794 0 R 4793 0 R 4792 0 R 4797 0 R 4796 0 R ]
+         /Parent 109820 0 R
+         /StructParents 941
+         /Contents 229 0 R
+         /Rotate 0
+         /MediaBox [ 0 0 531 666 ]
+         /Resources <<
+            /Font <<
+               /T1_0 3914 0 R
+               /T1_1 3912 0 R
+               /T1_2 3957 0 R
+               /T1_3 3913 0 R
+               /T1_4 4576 0 R
+               /T1_5 3931 0 R
+               /T1_6 3944 0 R
+            >>
+            /ProcSet [ /PDF /Text ]
+            /ExtGState <<
+               /GS0 333283 0 R
+            >>
+         >>
+         /Type /Page
+      >>
+
+-----
+
+   .. method:: Document.isStream(xref)
+
+      *(New in version 1.14.14)*
+      
+      PDF only: Check whether the object represented by :data:`xref` is a :data:`stream` type. Return is *False* if not a PDF or if the number is outside the valid xref range.
+
+      :arg int xref: :data:`xref` number.
+
+      :returns: *True* if the object definition is followed by data wrapped in keyword pair *stream*, *endstream*.
+
+-----
+
+   .. method:: Document._getNewXref()
+
+      Increase the :data:`xref` by one entry and return that number. This can then be used to insert a new object.
+
+      :rtype: int
+      :returns: the number of the new :data:`xref` entry.
+
+-----
+
+   .. method:: Document._updateObject(xref, obj_str, page=None)
+
+      Associate the object identified by string *obj_str* with *xref*, which must already exist. If *xref* pointed to an existing object, this will be replaced with the new object. If a page object is specified, links and other annotations of this page will be reloaded after the object has been updated.
+
+      :arg int xref: :data:`xref` number.
+
+      :arg str obj_str: a string containing a valid PDF object definition.
+
+      :arg page: a page object. If provided, indicates, that annotations of this page should be refreshed (reloaded) to reflect changes incurred with links and / or annotations.
+      :type page: :ref:`Page`
+
+      :rtype: int
+      :returns: zero if successful, otherwise an exception will be raised.
+
+-----
+
+   .. method:: Document._getXrefLength()
+
+      Return length of :data:`xref` table.
+
+      :rtype: int
+      :returns: the number of entries in the :data:`xref` table.
+
+-----
+
+   .. method:: Document._getXrefStream(xref)
+
+      Return the decompressed stream of the object referenced by *xref*. For non-stream objects *None* is returned.
+
+      :arg int xref: :data:`xref` number.
+
+      :rtype: bytes
+      :returns: the (decompressed) stream of the object.
+
+-----
+
+   .. method:: Document._updateStream(xref, stream, new=False)
+
+      Replace the stream of an object identified by *xref*. If the object has no stream, an exception is raised unless *new=True* is used. The function automatically performs a compress operation ("deflate") where beneficial.
+
+      :arg int xref: :data:`xref` number.
+
+      :arg bytes|bytearray|BytesIO stream: the new content of the stream.
+
+         *(Changed in version 1.14.13:)* *io.BytesIO* objects are now also supported.
+
+      :arg bool new: whether to force accepting the stream, and thus **turning it into a stream object**.
+
+      This method is intended to manipulate streams containing PDF operator syntax (see pp. 985 of the :ref:`AdobeManual`) as it is the case for e.g. page content streams.
+
+      If you update a contents stream, you should use save parameter *clean=True*. This ensures consistency between PDF operator source and the object structure.
+
+      Example: Let us assume that you no longer want a certain image appear on a page. This can be achieved by deleting the respective reference in its contents source(s) -- and indeed: the image will be gone after reloading the page. But the page's :data:`resources` object would still show the image as being referenced by the page. This save option will clean up any such mismatches.
+
+-----
+
+   .. method:: Document._getOLRootNumber()
+
+       Return :data:`xref` number of the /Outlines root object (this is **not** the first outline entry!). If this object does not exist, a new one will be created.
+
+      :rtype: int
+      :returns: :data:`xref` number of the **/Outlines** root object.
+
+   .. method:: Document.extractImage(xref)
+
+      PDF Only: Extract data and meta information of an image stored in the document. The output can directly be used to be stored as an image file, as input for PIL, :ref:`Pixmap` creation, etc. This method avoids using pixmaps wherever possible to present the image in its original format (e.g. as JPEG).
+
+      :arg int xref: :data:`xref` of an image object. If this is not in *range(1, doc.xrefLength())*, or the object is no image or other errors occur, *None* is returned and no exception is raised.
+
+      :rtype: dict
+      :returns: a dictionary with the following keys
+
+        * *ext* (*str*) image type (e.g. *'jpeg'*), usable as image file extension
+        * *smask* (*int*) :data:`xref` number of a stencil (/SMask) image or zero
+        * *width* (*int*) image width
+        * *height* (*int*) image height
+        * *colorspace* (*int*) the image's *colorspace.n* number.
+        * *cs-name* (*str*) the image's *colorspace.name*.
+        * *xres* (*int*) resolution in x direction. Please also see :data:`resolution`.
+        * *yres* (*int*) resolution in y direction. Please also see :data:`resolution`.
+        * *image* (*bytes*) image data, usable as image file content
+
+      >>> d = doc.extractImage(1373)
+      >>> d
+      {'ext': 'png', 'smask': 2934, 'width': 5, 'height': 629, 'colorspace': 3, 'xres': 96,
+      'yres': 96, 'cs-name': 'DeviceRGB',
+      'image': b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x05\ ...'}
+      >>> imgout = open("image." + d["ext"], "wb")
+      >>> imgout.write(d["image"])
+      102
+      >>> imgout.close()
+
+      .. note:: There is a functional overlap with *pix = fitz.Pixmap(doc, xref)*, followed by a *pix.getPNGData()*. Main differences are that extractImage, **(1)** does not only deliver PNG image formats, **(2)** is **very** much faster with non-PNG images, **(3)** usually results in much less disk storage for extracted images, **(4)** returns *None* in error cases (generates no exception). Look at the following example images within the same PDF.
+
+         * xref 1268 is a PNG -- Comparable execution time and identical output::
+
+            In [23]: %timeit pix = fitz.Pixmap(doc, 1268);pix.getPNGData()
+            10.8 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+            In [24]: len(pix.getPNGData())
+            Out[24]: 21462
+
+            In [25]: %timeit img = doc.extractImage(1268)
+            10.8 ms ± 86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+            In [26]: len(img["image"])
+            Out[26]: 21462
+
+         * xref 1186 is a JPEG -- :meth:`Document.extractImage` is **many times faster** and produces a **much smaller** output (2.48 MB vs. 0.35 MB)::
+
+            In [27]: %timeit pix = fitz.Pixmap(doc, 1186);pix.getPNGData()
+            341 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
+            In [28]: len(pix.getPNGData())
+            Out[28]: 2599433
+
+            In [29]: %timeit img = doc.extractImage(1186)
+            15.7 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
+            In [30]: len(img["image"])
+            Out[30]: 371177
+
+   .. method:: Document.extractFont(xref, info_only=False)
+
+      PDF Only: Return an embedded font file's data and appropriate file extension. This can be used to store the font as an external file. The method does not throw exceptions (other than via checking for PDF and valid :data:`xref`).
+
+      :arg int xref: PDF object number of the font to extract.
+      :arg bool info_only: only return font information, not the buffer. To be used for information-only purposes, avoids allocation of large buffer areas.
+
+      :rtype: tuple
+      :returns: a tuple *(basename, ext, subtype, buffer)*, where *ext* is a 3-byte suggested file extension (*str*), *basename* is the font's name (*str*), *subtype* is the font's type (e.g. "Type1") and *buffer* is a bytes object containing the font file's content (or *b""*). For possible extension values and their meaning see :ref:`FontExtensions`. Return details on error:
+
+            * *("", "", "", b"")* -- invalid xref or xref is not a (valid) font object.
+            * *(basename, "n/a", "Type1", b"")* -- *basename* is one of the :ref:`Base-14-Fonts`, which cannot be extracted.
+
+      Example:
+
+      >>> # store font as an external file
+      >>> name, ext, buffer = doc.extractFont(4711)
+      >>> # assuming buffer is not None:
+      >>> ofile = open(name + "." + ext, "wb")
+      >>> ofile.write(buffer)
+      >>> ofile.close()
+
+      .. warning:: The basename is returned unchanged from the PDF. So it may contain characters (such as blanks) which may disqualify it as a filename for your operating system. Take appropriate action.
+
+      .. note: The returned *basename* in general is **not** the original file name, but it probably has some similarity.
+
+   .. attribute:: Document.FontInfos
+
+       Contains following information for any font inserted via :meth:`Page.insertFont` in **this** session of PyMuPDF:
+
+       * xref *(int)* -- XREF number of the */Type/Font* object.
+       * info *(dict)* -- detail font information with the following keys:
+
+            * name *(str)* -- name of the basefont
+            * idx *(int)* -- index number for multi-font files
+            * type *(str)* -- font type (like "TrueType", "Type0", etc.)
+            * ext *(str)* -- extension to be used, when font is extracted to a file (see :ref:`FontExtensions`).
+            * glyphs (*list*) -- list of glyph numbers and widths (filled by textinsertion methods).
+
+      :rtype: list
+
diff --git a/docs/glossary.rst b/docs/glossary.rst

new file mode 100644 (file)

index 0000000..73fd649
--- /dev/null
+++ b/docs/glossary.rst
@@ -0,0 +1,118 @@
+==============
+Glossary
+==============
+
+.. data:: matrix_like
+
+        A Python sequence of 6 numbers.
+
+.. data:: rect_like
+
+        A Python sequence of 4 numbers.
+
+.. data:: irect_like
+
+        A Python sequence of 4 integers.
+
+.. data:: point_like
+
+        A Python sequence of 2 numbers.
+
+.. data:: quad_like
+
+        A Python sequence of 4 :data:`point_like` items.
+
+.. data:: inheritable
+
+        A number of values in a PDF can be specified once and then be inherited by objects further down in a parent-child relationship. The mediabox (physical size) of pages can for example be specified in nodes of the :data:`pagetree` and will then be taken as value for all *kids*, which do not specify their own value.
+
+.. data:: MediaBox
+
+        A PDF array of 4 floats specifying a physical page size (:data:`inheritable`).
+
+.. data:: CropBox
+
+        A PDF array of 4 floats specifying a page's visible area (:data:`inheritable`). This value is **not affected** if the page is rotated.
+
+
+.. data:: catalog
+
+        A central PDF :data:`dictionary` -- also called "root" -- containing pointers to many other information.
+
+.. data:: contents
+
+        "A **content stream** is a PDF :data:`stream` :data:`object` whose data consists of a sequence of instructions describing the graphical elements to be painted on a page." (:ref:`AdobeManual` p. 151). For an overview of the mini-language used in these streams see chapter "Operator Summary" on page 985 of the :ref:`AdobeManual`. A PDF :data:`page` can have none to many contents objects. If it has none, the page is empty (but still may show annotations). If it has several, they will be interpreted in sequence as if their instructions had been present in one such object (i.e. like in a concatenated string). It should be noted that there are more stream object types which use the same syntax: e.g. appearance dictionaries associated with annotations and Form XObjects.
+
+.. data:: resources
+
+        A :data:`dictionary` containing references to any resources (like images or fonts) required by a PDF :data:`page` (required, inheritable, :ref:`AdobeManual` p. 145) and certain other objects (Form XObjects). This dictionary appears as a sub-dictionary in the object definition under the key */Resources*. Being an inheritable object type, there may exist "parent" resources for all pages or certain subsets of pages.
+
+.. data:: dictionary
+
+        A PDF :data:`object` type, which is somewhat comparable to the same-named Python notion: "A dictionary object is an associative table containing pairs of objects, known as the dictionary's entries. The first element of each entry is the key and the second element is the value. The key must be a name (...). The value can be any kind of object, including another dictionary. A dictionary entry whose value is null (...) is equivalent to an absent entry." (:ref:`AdobeManual` p. 59).
+
+        Dictionaries are the most important :data:`object` type in PDF. Here is an example (describing a :data:`page`)::
+
+            <<
+            /Contents 40 0 R                  % value: an indirect object
+            /Type/Page                        % value: a name object
+            /MediaBox[0 0 595.32 841.92]      % value: an array object
+            /Rotate 0                         % value: a number object
+            /Parent 12 0 R                    % value: an indirect object
+            /Resources<<                      % value: a dictionary object
+                /ExtGState<</R7 26 0 R>>
+                /Font<<
+                     /R8 27 0 R/R10 21 0 R/R12 24 0 R/R14 15 0 R
+                     /R17 4 0 R/R20 30 0 R/R23 7 0 R /R27 20 0 R
+                     >>
+                /ProcSet[/PDF/Text]           % value: array of two name objects
+                >>
+            /Annots[55 0 R]                   % value: array, one entry (indirect object)
+            >>
+
+        *Contents*, *Type*, *MediaBox*, etc. are **keys**, *40 0 R*, *Page*, *[0 0 595.32 841.92]*, etc. are the respective **values**. The strings *"<<"* and *">>"* are used to enclose object definitions.
+
+        This example also shows the syntax of **nested** dictionary values: *Resources* has an object as its value, which in turn is a dictionary with keys like *ExtGState* (with the value *<</R7 26 0 R>>*, which is another dictionary), etc.
+
+.. data:: page
+
+        A PDF page is a :data:`dictionary` object which defines one page in a PDF, see :ref:`AdobeManual` p. 145.
+
+.. data:: pagetree
+
+        "The pages of a document are accessed through a structure known as the page tree, which defines the ordering of pages in the document. The tree structure allows PDF consumer applications, using only limited memory, to quickly open a document containing thousands of pages. The tree contains nodes of two types: intermediate nodes, called page tree nodes, and leaf nodes, called page objects." (:ref:`AdobeManual` p. 143).
+
+        While it is possible to list all page references in just one array, PDFs with many pages are often created using *balanced tree* structures ("page trees") for faster access to any single page. In relation to the total number of pages, this can reduce the average page access time by page number from a linear to some logarithmic order of magnitude.
+
+        For fast page access, MuPDF can use its own array in memory -- independently from what may or may not be present in the document file. This array is indexed by page number and therefore much faster than even the access via a perfectly balanced page tree.
+
+.. data:: object
+
+        Similar to Python, PDF supports the notion *object*, which can come in eight basic types: boolean values, integer and real numbers, strings, names, arrays, dictionaries, streams, and the null object (:ref:`AdobeManual` p. 51). Objects can be made identifyable by assigning a label. This label is then called *indirect* object. PyMuPDF supports retrieving definitions of indirect objects via their cross reference number via :meth:`Document.xrefObject`.
+
+.. data:: stream
+
+        A PDF :data:`object` type which is a sequence of bytes, similar to a string. "However, a PDF application can read a stream incrementally, while a string must be read in its entirety. Furthermore, a stream can be of unlimited length, whereas a string is subject to an implementation limit. For this reason, objects with potentially large amounts of data, such as images and page descriptions, are represented as streams." "A stream consists of a :data:`dictionary` followed by zero or more bytes bracketed between the keywords *stream* and *endstream*"::
+
+            nnn 0 obj
+            <<
+               dictionary definition
+            >>
+            stream
+            (zero or more bytes)
+            endstream
+            endobj
+
+        See :ref:`AdobeManual` p. 60. PyMuPDF supports retrieving stream content via :meth:`Document.xrefStream`. Use :meth:`Document.isStream` to determine whether an object is of stream type.
+
+.. data:: unitvector
+
+        A mathematical notion meaning a vector of norm ("length") 1 -- usually the Euclidean norm is implied. In PyMuPDF, this term is restricted to :ref:`Point` objects, see :attr:`Point.unit`.
+
+.. data:: xref
+
+        Abbreviation for cross-reference number: this is an integer unique identification for objects in a PDF. There exists a cross-reference table (which may physically consist of several separate segments) in each PDF, which stores the relative position of each object for quick lookup. The cross-reference table is one entry longer than the number of existing object: item zero is reserved and must not be used in any way. Many PyMuPDF classes have an *xref* attribute (which is zero for non-PDFs), and one can find out the total number of objects in a PDF via :meth:`Document.xrefLength` *- 1*.
+
+.. data:: resolution
+
+        Images and :ref:`Pixmap` objects may contain resolution information provided as "dots per inch", dpi, in each direction (horizontal and vertical). When MuPDF reads an image form a file or from a PDF object, it will parse this information and put it in :attr:`Pixmap.xres`, :attr:`Pixmap.yres`, respectively. When it finds not meaningful information in the input (like non-positive values or values exceeding 4800), it will use "sane" defaults instead. The usual default value is 96, but it may also be 72 in some cases (e.g. 72 for JPX images).
diff --git a/docs/identity.rst b/docs/identity.rst

new file mode 100644 (file)

index 0000000..03d8d0e
--- /dev/null
+++ b/docs/identity.rst
@@ -0,0 +1,16 @@
+.. _Identity:
+
+============
+Identity
+============
+
+Identity is a :ref:`Matrix` that performs no action -- to be used whenever the syntax requires a matrix, but no actual transformation should take place. It has the form *fitz.Matrix(1, 0, 0, 1, 0, 0)*.
+
+Identity is a constant, an "immutable" object. So, all of its matrix properties are read-only and its methods are disabled.
+
+If you need a **mutable** identity matrix as a starting point, use one of the following statements::
+
+    >>> m = fitz.Matrix(1, 0, 0, 1, 0, 0)  # specify the values
+    >>> m = fitz.Matrix(1, 1)              # use scaling by factor 1
+    >>> m = fitz.Matrix(0)                 # use rotation by zero degrees
+    >>> m = fitz.Matrix(fitz.Identity)     # make a copy of Identity
diff --git a/docs/images/img-4up.png b/docs/images/img-4up.png

new file mode 100644 (file)

index 0000000..f526446

Binary files /dev/null and b/docs/images/img-4up.png differ
diff --git a/docs/images/img-7edges.png b/docs/images/img-7edges.png

new file mode 100644 (file)

index 0000000..957433e

Binary files /dev/null and b/docs/images/img-7edges.png differ
diff --git a/docs/images/img-a-is--1.png b/docs/images/img-a-is--1.png

new file mode 100644 (file)

index 0000000..86c4139

Binary files /dev/null and b/docs/images/img-a-is--1.png differ
diff --git a/docs/images/img-adobe.png b/docs/images/img-adobe.png

new file mode 100644 (file)

index 0000000..32492f9

Binary files /dev/null and b/docs/images/img-adobe.png differ
diff --git a/docs/images/img-alpha-0.png b/docs/images/img-alpha-0.png

new file mode 100644 (file)

index 0000000..bde6f53

Binary files /dev/null and b/docs/images/img-alpha-0.png differ
diff --git a/docs/images/img-alpha-1.png b/docs/images/img-alpha-1.png

new file mode 100644 (file)

index 0000000..0f3e077

Binary files /dev/null and b/docs/images/img-alpha-1.png differ
diff --git a/docs/images/img-annots.jpg b/docs/images/img-annots.jpg

new file mode 100644 (file)

index 0000000..1ff17af

Binary files /dev/null and b/docs/images/img-annots.jpg differ
diff --git a/docs/images/img-attach-result.jpg b/docs/images/img-attach-result.jpg

new file mode 100644 (file)

index 0000000..9ce2fd8

Binary files /dev/null and b/docs/images/img-attach-result.jpg differ
diff --git a/docs/images/img-b-is-0.5.png b/docs/images/img-b-is-0.5.png

new file mode 100644 (file)

index 0000000..4f256dd

Binary files /dev/null and b/docs/images/img-b-is-0.5.png differ
diff --git a/docs/images/img-binsetupdirs.png b/docs/images/img-binsetupdirs.png

new file mode 100644 (file)

index 0000000..4cd036b

Binary files /dev/null and b/docs/images/img-binsetupdirs.png differ
diff --git a/docs/images/img-breadth.png b/docs/images/img-breadth.png

new file mode 100644 (file)

index 0000000..6b39d2e

Binary files /dev/null and b/docs/images/img-breadth.png differ
diff --git a/docs/images/img-c-is-0.5.png b/docs/images/img-c-is-0.5.png

new file mode 100644 (file)

index 0000000..215fa54

Binary files /dev/null and b/docs/images/img-c-is-0.5.png differ
diff --git a/docs/images/img-cake.png b/docs/images/img-cake.png

new file mode 100644 (file)

index 0000000..f1151dd

Binary files /dev/null and b/docs/images/img-cake.png differ
diff --git a/docs/images/img-caret-annot.jpg b/docs/images/img-caret-annot.jpg

new file mode 100644 (file)

index 0000000..5889089

Binary files /dev/null and b/docs/images/img-caret-annot.jpg differ
diff --git a/docs/images/img-circle.png b/docs/images/img-circle.png

new file mode 100644 (file)

index 0000000..e87aba3

Binary files /dev/null and b/docs/images/img-circle.png differ
diff --git a/docs/images/img-clip.jpg b/docs/images/img-clip.jpg

new file mode 100644 (file)

index 0000000..e6a1fe4

Binary files /dev/null and b/docs/images/img-clip.jpg differ
diff --git a/docs/images/img-colordb.png b/docs/images/img-colordb.png

new file mode 100644 (file)

index 0000000..91f72a5

Binary files /dev/null and b/docs/images/img-colordb.png differ
diff --git a/docs/images/img-copy-speed-1.png b/docs/images/img-copy-speed-1.png

new file mode 100644 (file)

index 0000000..caab390

Binary files /dev/null and b/docs/images/img-copy-speed-1.png differ
diff --git a/docs/images/img-copy-speed-2.png b/docs/images/img-copy-speed-2.png

new file mode 100644 (file)

index 0000000..6eed76e

Binary files /dev/null and b/docs/images/img-copy-speed-2.png differ
diff --git a/docs/images/img-d-is--1.png b/docs/images/img-d-is--1.png

new file mode 100644 (file)

index 0000000..2371c49

Binary files /dev/null and b/docs/images/img-d-is--1.png differ
diff --git a/docs/images/img-drawBezier.png b/docs/images/img-drawBezier.png

new file mode 100644 (file)

index 0000000..a5b680f

Binary files /dev/null and b/docs/images/img-drawBezier.png differ
diff --git a/docs/images/img-drawCurve.png b/docs/images/img-drawCurve.png

new file mode 100644 (file)

index 0000000..d9ea180

Binary files /dev/null and b/docs/images/img-drawCurve.png differ
diff --git a/docs/images/img-drawSector1.png b/docs/images/img-drawSector1.png

new file mode 100644 (file)

index 0000000..f3afb94

Binary files /dev/null and b/docs/images/img-drawSector1.png differ
diff --git a/docs/images/img-drawSector2.png b/docs/images/img-drawSector2.png

new file mode 100644 (file)

index 0000000..52e6933

Binary files /dev/null and b/docs/images/img-drawSector2.png differ
diff --git a/docs/images/img-drawcircle.jpg b/docs/images/img-drawcircle.jpg

new file mode 100644 (file)

index 0000000..b8b0a8e

Binary files /dev/null and b/docs/images/img-drawcircle.jpg differ
diff --git a/docs/images/img-drawquad.jpg b/docs/images/img-drawquad.jpg

new file mode 100644 (file)

index 0000000..2513287

Binary files /dev/null and b/docs/images/img-drawquad.jpg differ
diff --git a/docs/images/img-e-is-100.png b/docs/images/img-e-is-100.png

new file mode 100644 (file)

index 0000000..db6e877

Binary files /dev/null and b/docs/images/img-e-is-100.png differ
diff --git a/docs/images/img-embed-progress.jpg b/docs/images/img-embed-progress.jpg

new file mode 100644 (file)

index 0000000..c37fe26

Binary files /dev/null and b/docs/images/img-embed-progress.jpg differ
diff --git a/docs/images/img-encoding.jpg b/docs/images/img-encoding.jpg

new file mode 100644 (file)

index 0000000..02ce105

Binary files /dev/null and b/docs/images/img-encoding.jpg differ
diff --git a/docs/images/img-encrypting.jpg b/docs/images/img-encrypting.jpg

new file mode 100644 (file)

index 0000000..81e747a

Binary files /dev/null and b/docs/images/img-encrypting.jpg differ
diff --git a/docs/images/img-even-odd.png b/docs/images/img-even-odd.png

new file mode 100644 (file)

index 0000000..a959c43

Binary files /dev/null and b/docs/images/img-even-odd.png differ
diff --git a/docs/images/img-extract-imga.jpg b/docs/images/img-extract-imga.jpg

new file mode 100644 (file)

index 0000000..1ab90a8

Binary files /dev/null and b/docs/images/img-extract-imga.jpg differ
diff --git a/docs/images/img-extract-imgb.jpg b/docs/images/img-extract-imgb.jpg

new file mode 100644 (file)

index 0000000..d439250

Binary files /dev/null and b/docs/images/img-extract-imgb.jpg differ
diff --git a/docs/images/img-f-is-100.png b/docs/images/img-f-is-100.png

new file mode 100644 (file)

index 0000000..61a8fac

Binary files /dev/null and b/docs/images/img-f-is-100.png differ
diff --git a/docs/images/img-filesizes.png b/docs/images/img-filesizes.png

new file mode 100644 (file)

index 0000000..34e7f38

Binary files /dev/null and b/docs/images/img-filesizes.png differ
diff --git a/docs/images/img-freetext.jpg b/docs/images/img-freetext.jpg

new file mode 100644 (file)

index 0000000..1766dd4

Binary files /dev/null and b/docs/images/img-freetext.jpg differ
diff --git a/docs/images/img-import-progress.jpg b/docs/images/img-import-progress.jpg

new file mode 100644 (file)

index 0000000..36bc223

Binary files /dev/null and b/docs/images/img-import-progress.jpg differ
diff --git a/docs/images/img-inkannot.jpg b/docs/images/img-inkannot.jpg

new file mode 100644 (file)

index 0000000..0ea913c

Binary files /dev/null and b/docs/images/img-inkannot.jpg differ
diff --git a/docs/images/img-inserttext.jpg b/docs/images/img-inserttext.jpg

new file mode 100644 (file)

index 0000000..c5bd3fd

Binary files /dev/null and b/docs/images/img-inserttext.jpg differ
diff --git a/docs/images/img-markedpdf.jpg b/docs/images/img-markedpdf.jpg

new file mode 100644 (file)

index 0000000..9860354

Binary files /dev/null and b/docs/images/img-markedpdf.jpg differ
diff --git a/docs/images/img-markers.jpg b/docs/images/img-markers.jpg

new file mode 100644 (file)

index 0000000..6766b5d

Binary files /dev/null and b/docs/images/img-markers.jpg differ
diff --git a/docs/images/img-matrix.png b/docs/images/img-matrix.png

new file mode 100644 (file)

index 0000000..d10ce90

Binary files /dev/null and b/docs/images/img-matrix.png differ
diff --git a/docs/images/img-opacity.jpg b/docs/images/img-opacity.jpg

new file mode 100644 (file)

index 0000000..beb011e

Binary files /dev/null and b/docs/images/img-opacity.jpg differ
diff --git a/docs/images/img-original.png b/docs/images/img-original.png

new file mode 100644 (file)

index 0000000..5c35196

Binary files /dev/null and b/docs/images/img-original.png differ
diff --git a/docs/images/img-pdfjoiner.jpg b/docs/images/img-pdfjoiner.jpg

new file mode 100644 (file)

index 0000000..e16cada

Binary files /dev/null and b/docs/images/img-pdfjoiner.jpg differ
diff --git a/docs/images/img-pdftext.jpg b/docs/images/img-pdftext.jpg

new file mode 100644 (file)

index 0000000..36b82cd

Binary files /dev/null and b/docs/images/img-pdftext.jpg differ
diff --git a/docs/images/img-planish.png b/docs/images/img-planish.png

new file mode 100644 (file)

index 0000000..84f32d2

Binary files /dev/null and b/docs/images/img-planish.png differ
diff --git a/docs/images/img-point-unit.jpg b/docs/images/img-point-unit.jpg

new file mode 100644 (file)

index 0000000..476af71

Binary files /dev/null and b/docs/images/img-point-unit.jpg differ
diff --git a/docs/images/img-polyline.png b/docs/images/img-polyline.png

new file mode 100644 (file)

index 0000000..ac5817d

Binary files /dev/null and b/docs/images/img-polyline.png differ
diff --git a/docs/images/img-posterize.png b/docs/images/img-posterize.png

new file mode 100644 (file)

index 0000000..0719c96

Binary files /dev/null and b/docs/images/img-posterize.png differ
diff --git a/docs/images/img-pymupdf.jpg b/docs/images/img-pymupdf.jpg

new file mode 100644 (file)

index 0000000..184a2d6

Binary files /dev/null and b/docs/images/img-pymupdf.jpg differ
diff --git a/docs/images/img-quads.jpg b/docs/images/img-quads.jpg

new file mode 100644 (file)

index 0000000..78dc73c

Binary files /dev/null and b/docs/images/img-quads.jpg differ
diff --git a/docs/images/img-redact.jpg b/docs/images/img-redact.jpg

new file mode 100644 (file)

index 0000000..ea2d0eb

Binary files /dev/null and b/docs/images/img-redact.jpg differ
diff --git a/docs/images/img-render-speed.png b/docs/images/img-render-speed.png

new file mode 100644 (file)

index 0000000..f85b440

Binary files /dev/null and b/docs/images/img-render-speed.png differ
diff --git a/docs/images/img-rendermode.jpg b/docs/images/img-rendermode.jpg

new file mode 100644 (file)

index 0000000..50f00d3

Binary files /dev/null and b/docs/images/img-rendermode.jpg differ
diff --git a/docs/images/img-rot+morph.png b/docs/images/img-rot+morph.png

new file mode 100644 (file)

index 0000000..c2a2367

Binary files /dev/null and b/docs/images/img-rot+morph.png differ
diff --git a/docs/images/img-rot-60.png b/docs/images/img-rot-60.png

new file mode 100644 (file)

index 0000000..06f88de

Binary files /dev/null and b/docs/images/img-rot-60.png differ
diff --git a/docs/images/img-rotate.png b/docs/images/img-rotate.png

new file mode 100644 (file)

index 0000000..dcfcf5d

Binary files /dev/null and b/docs/images/img-rotate.png differ
diff --git a/docs/images/img-showpdfpage.jpg b/docs/images/img-showpdfpage.jpg

new file mode 100644 (file)

index 0000000..b9f9fd6

Binary files /dev/null and b/docs/images/img-showpdfpage.jpg differ
diff --git a/docs/images/img-sierpinski.png b/docs/images/img-sierpinski.png

new file mode 100644 (file)

index 0000000..c680a17

Binary files /dev/null and b/docs/images/img-sierpinski.png differ
diff --git a/docs/images/img-squiggly.png b/docs/images/img-squiggly.png

new file mode 100644 (file)

index 0000000..c485cc8

Binary files /dev/null and b/docs/images/img-squiggly.png differ
diff --git a/docs/images/img-stampannot.jpg b/docs/images/img-stampannot.jpg

new file mode 100644 (file)

index 0000000..abd8c1c

Binary files /dev/null and b/docs/images/img-stampannot.jpg differ
diff --git a/docs/images/img-stencil.jpg b/docs/images/img-stencil.jpg

new file mode 100644 (file)

index 0000000..dd842f4

Binary files /dev/null and b/docs/images/img-stencil.jpg differ
diff --git a/docs/images/img-symbols.jpg b/docs/images/img-symbols.jpg

new file mode 100644 (file)

index 0000000..4178a61

Binary files /dev/null and b/docs/images/img-symbols.jpg differ
diff --git a/docs/images/img-target.png b/docs/images/img-target.png

new file mode 100644 (file)

index 0000000..d88adb7

Binary files /dev/null and b/docs/images/img-target.png differ
diff --git a/docs/images/img-textbox.jpg b/docs/images/img-textbox.jpg

new file mode 100644 (file)

index 0000000..3617724

Binary files /dev/null and b/docs/images/img-textbox.jpg differ
diff --git a/docs/images/img-textboxtract.png b/docs/images/img-textboxtract.png

new file mode 100644 (file)

index 0000000..3ab3dfe

Binary files /dev/null and b/docs/images/img-textboxtract.png differ
diff --git a/docs/images/img-textmarker.jpg b/docs/images/img-textmarker.jpg

new file mode 100644 (file)

index 0000000..e0af037

Binary files /dev/null and b/docs/images/img-textmarker.jpg differ
diff --git a/docs/images/img-textmethods.png b/docs/images/img-textmethods.png

new file mode 100644 (file)

index 0000000..1272520

Binary files /dev/null and b/docs/images/img-textmethods.png differ
diff --git a/docs/images/img-textpage-char.png b/docs/images/img-textpage-char.png

new file mode 100644 (file)

index 0000000..118fc75

Binary files /dev/null and b/docs/images/img-textpage-char.png differ
diff --git a/docs/images/img-textpage.png b/docs/images/img-textpage.png

new file mode 100644 (file)

index 0000000..c53148f

Binary files /dev/null and b/docs/images/img-textpage.png differ
diff --git a/docs/images/img-textperformance.png b/docs/images/img-textperformance.png

new file mode 100644 (file)

index 0000000..0b70dc8

Binary files /dev/null and b/docs/images/img-textperformance.png differ
diff --git a/docs/images/img-timings.png b/docs/images/img-timings.png

new file mode 100644 (file)

index 0000000..6bc801f

Binary files /dev/null and b/docs/images/img-timings.png differ
diff --git a/docs/images/img-writeimage.png b/docs/images/img-writeimage.png

new file mode 100644 (file)

index 0000000..ee66b00

Binary files /dev/null and b/docs/images/img-writeimage.png differ
diff --git a/docs/images/mupdf-icons.jpg b/docs/images/mupdf-icons.jpg

new file mode 100644 (file)

index 0000000..88e3137

Binary files /dev/null and b/docs/images/mupdf-icons.jpg differ
diff --git a/docs/index.rst b/docs/index.rst

new file mode 100644 (file)

index 0000000..dc7088e
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,22 @@
+**PyMuPDF Documentation**
+=================================
+
+.. toctree::
+   :maxdepth: 4
+
+   intro
+   installation
+   tutorial
+   faq
+   module
+   classes
+   algebra
+   lowlevel
+   glossary
+   vars
+   colors
+   app1
+   app2
+   app3
+   app4
+   changes
diff --git a/docs/installation.rst b/docs/installation.rst

new file mode 100644 (file)

index 0000000..8b0668a
--- /dev/null
+++ b/docs/installation.rst
@@ -0,0 +1,64 @@
+Installation
+=============
+PyMuPDF can be installed from sources as follows or from wheels, see :ref:`InstallBinary`.
+
+.. _InstallSource:
+
+Option 1: Install from Sources
+-------------------------------
+This is a three-step process.
+
+Step 1: Download PyMuPDF
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Download the sources from https://pypi.org/project/PyMuPDF/#files and decompress them.
+
+Step 2: Download and Generate MuPDF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Download *mupdf-x.xx.x-source.tar.gz* from `Mupdf <https://mupdf.com/downloads/archive>`_ and unzip / decompress it. Make sure to download the (sub-) version for which PyMuPDF has stated its compatibility.
+
+..  note:: The latest MuPDF **development sources** are available on https://github.com/ArtifexSoftware/mupdf -- this is **not** what you want here.
+
+
+**Applying any Changes and Hot Fixes to MuPDF Sources**
+
+On occasion, vital hot fixes or functional enhancements must be applied to MuPDF sources before it is generated.
+
+Any such files are contained in the *fitz* directory of the `PyMuPDF homepage <https://github.com/pymupdf/PyMuPDF/tree/master/fitz>`_ -- their names all start with an underscore *"_"*. Currently (v1.16.x), these files and their copy destinations are the following:
+
+* *_config.h* -- PyMuPDF's configuration to control the binary file size and the inclusion of MuPDF features, see next section. This file must renamed and replace MuPDF file */include/mupdf/fitz/config.h*. This file controls the size of the PyMuPDF binary by cutting away unneeded fonts from MuPDF.
+
+**Generate MuPDF**
+
+The MuPDF source includes generation procedures / makefiles for numerous platforms. For Windows platforms, Visual Studio solution and project definitions are provided.
+
+PyMuPDF's `homepage <https://github.com/pymupdf/PyMuPDF/>`_ contains additional details and hints.
+
+Step 3: Build / Setup PyMuPDF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Adjust the setup.py script as necessary. E.g. make sure that:
+
+  * the include directory is correctly set in sync with your directory structure
+  * the object code libraries are correctly defined
+
+Now perform a *python setup.py install*.
+
+.. note:: You can also install from the sources of the Github repository. These **do not contain** the pre-generated files *fitz.py* or *fitz_wrap.c*, which instead are generated by the installation script *setup.py*. To use it, `SWIG <https://www.swig.org/>`_ must be installed on your system.
+
+
+.. _InstallBinary:
+
+Option 2: Install from Binaries
+--------------------------------
+You can install PyMuPDF from Python wheels. The wheels are *self-contained*, i.e. you will **not need any other software** nor download / install MuPDF to run PyMuPDF scripts.
+This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 2.7 and 3.5 through 3.8.
+Windows binaries are provided for Python **32-bit and 64-bit** versions.
+
+**Overview of wheel names (PyMuPDF version is x.xx.xx):**
+
+.. literalinclude:: wheelnames.txt
+
+
+Older versions can be found in the releases directory of our home page https://github.com/pymupdf/PyMuPDF/releases.
+
+If you unexpectedly run into problems installing the wheel for your system, please make sure you have updated your PIP to the current version.
+
diff --git a/docs/intro.rst b/docs/intro.rst

new file mode 100644 (file)

index 0000000..8b4a9aa
--- /dev/null
+++ b/docs/intro.rst
@@ -0,0 +1,59 @@
+Introduction
+==============
+
+.. image:: images/img-pymupdf.jpg
+   :align: center
+
+**PyMuPDF** is a Python binding for `MuPDF <http://www.mupdf.com/>`_ -- "a lightweight PDF and XPS viewer".
+
+MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.
+
+These are files with extensions *.pdf*, *.xps*, *.oxps*, *.cbz*, *.fb2*  or *.epub* (so you can develop **e-book viewers in Python** ...).
+
+PyMuPDF provides access to many important functions of MuPDF from within a Python environment, and we are continuously seeking to expand this function set.
+
+MuPDF stands out among all similar products for its top rendering capability and unsurpassed processing speed. At the same time, its "light weight" makes it an excellent choice for platforms where resources are typically limited, like smartphones.
+
+Check this out yourself and compare the various free PDF-viewers. In terms of speed and rendering quality `SumatraPDF <http://www.sumatrapdfreader.org/>`_ ranges at the top (apart from MuPDF's own standalone viewer) -- since it has changed its library basis to  MuPDF!
+
+While PyMuPDF has been available since several years for an earlier version of MuPDF (v1.2, called **fitz-python** then), it was until only mid May 2015, that its creator and a few co-workers decided to elevate it to support current releases of MuPDF.
+
+PyMuPDF runs and has been tested on Mac, Linux, Windows XP SP2 and up, Python 2.7 through Python 3.7 (note that Python supports Windows XP only up to v3.4), 32bit and 64bit versions. Other platforms should work too, as long as MuPDF and Python support them.
+
+PyMuPDF is hosted on `GitHub <https://github.com/pymupdf/PyMuPDF>`_. We also are registered on `PyPI <https://pypi.org/project/PyMuPDF/>`_.
+
+For MS Windows and popular Python versions on Mac OSX and Linux we have created wheels. So installation should be convenient enough for hopefully most of our users: just issue
+
+*pip install --upgrade pymupdf*
+
+If your platform is not among those supported with a wheel, your installation consists of two separate steps:
+
+1. Installation of MuPDF: this involves downloading the source from their website and then compiling it on your machine. Adjust *setup.py* to point to the right directories (next step), before you try generating PyMuPDF.
+
+2. Installation of PyMuPDF: this step is normal Python procedure. Usually you will have to adapt the *setup.py* to point to correct *include* and *lib* directories of your generated MuPDF.
+
+For installation details check out the respective chapter.
+
+There exist several `demo <https://github.com/pymupdf/PyMuPDF/tree/master/demo>`_ and `example <https://github.com/pymupdf/PyMuPDF/tree/master/examples>`_ programs in the main repository, ranging from simple code snippets to full-featured utilities, like text extraction, PDF joiners and bookmark maintenance.
+
+Interesting **PDF manipulation and generation** functions have been added over time, including metadata and bookmark maintenance, document restructuring, annotation / link handling and document or page creation.
+
+Note on the Name *fitz*
+--------------------------
+The standard Python import statement for this library is *import fitz*. This has a historical reason:
+
+The original rendering library for MuPDF was called *Libart*.
+
+*"After Artifex Software acquired the MuPDF project, the development focus shifted on writing a new modern graphics library called *Fitz*. Fitz was originally intended as an R&D project to replace the aging Ghostscript graphics library, but has instead become the rendering engine powering MuPDF."* (Quoted from `Wikipedia <https://en.wikipedia.org/wiki/MuPDF>`_).
+
+License
+--------
+PyMuPDF is distributed under GNU GPL V3 (or later, at your choice).
+
+MuPDF is distributed under a separate license, the **GNU AFFERO GPL V3**.
+
+Both licenses apply, when you use PyMuPDF.
+
+.. note:: Version 3 of the GNU AFFERO GPL is a lot less restrictive than its earlier versions used to be. It basically is an open source freeware license, that obliges your software to also being open source and freeware. Consult `this website <http://artifex.com/licensing/>`_, if you want to create a commercial product with PyMuPDF.
+
+.. include:: version.rst
diff --git a/docs/irect.rst b/docs/irect.rst

new file mode 100644 (file)

index 0000000..684f84f
--- /dev/null
+++ b/docs/irect.rst
@@ -0,0 +1,207 @@
+.. _IRect:
+
+==========
+IRect
+==========
+
+IRect is a rectangular bounding box similar to :ref:`Rect`, except that all corner coordinates are integers. IRect is used to specify an area of pixels, e.g. to receive image data during rendering. Otherwise, many similarities exist, e.g. considerations concerning emptiness and finiteness of rectangles also apply to this class.
+
+============================== ===========================================
+**Attribute / Method**          **Short Description**
+============================== ===========================================
+:meth:`IRect.contains`         checks containment of another object
+:meth:`IRect.getArea`          calculate rectangle area
+:meth:`IRect.getRect`          return a :ref:`Rect` with same coordinates
+:meth:`IRect.getRectArea`      calculate rectangle area
+:meth:`IRect.intersect`        common part with another rectangle
+:meth:`IRect.intersects`       checks for non-empty intersection
+:meth:`IRect.morph`            transform with a point and a matrix
+:meth:`IRect.norm`             the Euclidean norm
+:meth:`IRect.normalize`        makes a rectangle finite
+:attr:`IRect.bottom_left`      bottom left point, synonym *bl*
+:attr:`IRect.bottom_right`     bottom right point, synonym *br*
+:attr:`IRect.height`           height of the rectangle
+:attr:`IRect.isEmpty`          whether rectangle is empty
+:attr:`IRect.isInfinite`       whether rectangle is infinite
+:attr:`IRect.rect`             equals result of method *getRect()*
+:attr:`IRect.top_left`         top left point, synonym *tl*
+:attr:`IRect.top_right`        top_right point, synonym *tr*
+:attr:`IRect.quad`             :ref:`Quad` made from rectangle corners
+:attr:`IRect.width`            width of the rectangle
+:attr:`IRect.x0`               X-coordinate of the top left corner
+:attr:`IRect.x1`               X-coordinate of the bottom right corner
+:attr:`IRect.y0`               Y-coordinate of the top left corner
+:attr:`IRect.y1`               Y-coordinate of the bottom right corner
+============================== ===========================================
+
+**Class API**
+
+.. class:: IRect
+
+   .. method:: __init__(self)
+
+   .. method:: __init__(self, x0, y0, x1, y1)
+
+   .. method:: __init__(self, irect)
+
+   .. method:: __init__(self, sequence)
+
+      Overloaded constructors. Also see examples below and those for the :ref:`Rect` class.
+
+      If another irect is specified, a **new copy** will be made.
+
+      If sequence is specified, it must be a Python sequence type of 4 numbers (see :ref:`SequenceTypes`). Non-integer numbers will be truncated, non-numeric entries will raise an exception.
+
+      The other parameters mean integer coordinates.
+
+   .. method:: getRect()
+
+      A convenience function returning a :ref:`Rect` with the same coordinates. Also available as attribute *rect*.
+
+      :rtype: :ref:`Rect`
+
+   .. method:: getRectArea([unit])
+
+   .. method:: getArea([unit])
+
+      Calculates the area of the rectangle and, with no parameter, equals *abs(IRect)*. Like an empty rectangle, the area of an infinite rectangle is also zero.
+
+      :arg str unit: Specify required unit: respective squares of "px" (pixels, default), "in" (inches), "cm" (centimeters), or "mm" (millimeters).
+
+      :rtype: float
+
+   .. method:: intersect(ir)
+
+      The intersection (common rectangular area) of the current rectangle and *ir* is calculated and replaces the current rectangle. If either rectangle is empty, the result is also empty. If either rectangle is infinite, the other one is taken as the result -- and hence also infinite if both rectangles were infinite.
+
+      :arg rect_like ir: Second rectangle.
+
+   .. method:: contains(x)
+
+      Checks whether *x* is contained in the rectangle. It may be :data:`rect_like`, :data:`point_like` or a number. If *x* is an empty rectangle, this is always true. Conversely, if the rectangle is empty this is always *False*, if *x* is not an empty rectangle and not a number. If *x* is a number, it will be checked to be one of the four components. *x in irect* and *irect.contains(x)* are equivalent.
+
+      :arg x: the object to check.
+      :type x: :ref:`IRect` or :ref:`Rect` or :ref:`Point` or int
+
+      :rtype: bool
+
+   .. method:: intersects(r)
+
+      Checks whether the rectangle and the :data:`rect_like` "r" contain a common non-empty :ref:`IRect`. This will always be *False* if either is infinite or empty.
+
+      :arg rect_like r: the rectangle to check.
+
+      :rtype: bool
+
+   .. method:: morph(fixpoint, matrix)
+
+      *(New in version 1.17.0)*
+      
+      Return a new quad after applying a matrix to it using a fixed point.
+
+      :arg point_like fixpoint: the fixed point.
+      :arg matrix_like matrix: the matrix.
+      :returns: a new :ref:`Quad`. This a wrapper of the same-named quad method.
+
+   .. method:: norm()
+
+      *(New in version 1.16.0)*
+      
+      Return the Euclidean norm of the rectangle treated as a vector of four numbers.
+
+   .. method:: normalize()
+
+      Make the rectangle finite. This is done by shuffling rectangle corners. After this, the bottom right corner will indeed be south-eastern to the top left one. See :ref:`Rect` for a more details.
+
+   .. attribute:: top_left
+
+   .. attribute:: tl
+
+      Equals *Point(x0, y0)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: top_right
+
+   .. attribute:: tr
+
+      Equals *Point(x1, y0)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: bottom_left
+
+   .. attribute:: bl
+
+      Equals *Point(x0, y1)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: bottom_right
+
+   .. attribute:: br
+
+      Equals *Point(x1, y1)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: quad
+
+      The quadrilateral *Quad(irect.tl, irect.tr, irect.bl, irect.br)*.
+
+      :type: :ref:`Quad`
+
+   .. attribute:: width
+
+      Contains the width of the bounding box. Equals *abs(x1 - x0)*.
+
+      :type: int
+
+   .. attribute:: height
+
+      Contains the height of the bounding box. Equals *abs(y1 - y0)*.
+
+      :type: int
+
+   .. attribute:: x0
+
+      X-coordinate of the left corners.
+
+      :type: int
+
+   .. attribute:: y0
+
+      Y-coordinate of the top corners.
+
+      :type: int
+
+   .. attribute:: x1
+
+      X-coordinate of the right corners.
+
+      :type: int
+
+   .. attribute:: y1
+
+      Y-coordinate of the bottom corners.
+
+      :type: int
+
+   .. attribute:: isInfinite
+
+      *True* if rectangle is infinite, *False* otherwise.
+
+      :type: bool
+
+   .. attribute:: isEmpty
+
+      *True* if rectangle is empty, *False* otherwise.
+
+      :type: bool
+
+
+.. note::
+
+   * This class adheres to the Python sequence protocol, so components can be accessed via their index, too. Also refer to :ref:`SequenceTypes`.
+   * Rectangles can be used with arithmetic operators -- see chapter :ref:`Algebra`.
+
diff --git a/docs/kerning.style b/docs/kerning.style

new file mode 100644 (file)

index 0000000..35ccdee
--- /dev/null
+++ b/docs/kerning.style
@@ -0,0 +1,18 @@
+fontsAlias:
+    stdBold: DejaVu Sans-Bold
+    stdBoldItalic: DejaVu Sans-BoldOblique
+    stdFont: DejaVu Sans
+    stdItalic: DejaVu Sans-Oblique
+    stdMono: Courier New
+    stdMonoBold: DejaVu Sans Mono-Bold
+    stdMonoBoldItalic: DejaVu Sans Mono-BoldOblique
+    stdMonoItalic: DejaVu Sans Mono-Oblique
+    stdSans: DejaVu Sans
+    stdSansBold: DejaVu Sans-Bold
+    stdSansBoldItalic: DejaVu Sans-BoldOblique
+    stdSansItalic: DejaVu Sans-Oblique
+    stdSerif: DejaVu Serif
+
+styles: base: kerning: true
+
+styles: bodytext: alignment: left
diff --git a/docs/link.rst b/docs/link.rst

new file mode 100644 (file)

index 0000000..cbe4738
--- /dev/null
+++ b/docs/link.rst
@@ -0,0 +1,104 @@
+.. _Link:
+
+================
+Link
+================
+Represents a pointer to somewhere (this document, other documents, the internet). Links exist per document page, and they are forward-chained to each other, starting from an initial link which is accessible by the :attr:`Page.firstLink` property.
+
+There is a parent-child relationship between a link and its page. If the page object becomes unusable (closed document, any document structure change, etc.), then so does every of its existing link objects -- an exception is raised saying that the object is "orphaned", whenever a link property or method is accessed.
+
+========================= ============================================
+**Attribute**             **Short Description**
+========================= ============================================
+:meth:`Link.setBorder`    modify border properties
+:meth:`Link.setColors`    modify color properties
+:attr:`Link.border`       border characteristics
+:attr:`Link.colors`       border line color
+:attr:`Link.dest`         points to link destination details
+:attr:`Link.isExternal`   external link destination?
+:attr:`Link.next`         points to next link
+:attr:`Link.rect`         clickable area in untransformed coordinates.
+:attr:`Link.uri`          link destination
+:attr:`Link.xref`         :data:`xref` number of the entry
+========================= ============================================
+
+**Class API**
+
+.. class:: Link
+
+   .. method:: setBorder(border=None, width=0, style=None, dashes=None)
+
+      PDF only: Change border width and dashing properties.
+
+      *(Changed in version 1.16.9)* Allow specification without using a dictionary. The direct parameters are used if *border* is not a dictionary.
+
+      :arg dict border: a dictionary as returned by the :attr:`border` property, with keys *"width"* (*float*), *"style"* (*str*) and *"dashes"* (*sequence*). Omitted keys will leave the resp. property unchanged. To e.g. remove dashing use: *"dashes": []*. If dashes is not an empty sequence, "style" will automatically be set to "D" (dashed).
+
+      :arg float width: see above.
+      :arg str style: see above.
+      :arg sequence dashes: see above.
+
+   .. method:: setColors(colors=None, stroke=None, fill=None)
+
+      Changes the "stroke" and "fill" colors.
+
+      *(Changed in version 1.16.9)* Allow colors to be directly set. These parameters are used if *colors* is not a dictionary.
+
+      :arg dict colors: a dictionary containing color specifications. For accepted dictionary keys and values see below. The most practical way should be to first make a copy of the *colors* property and then modify this dictionary as required.
+      :arg sequence stroke: see above.
+      :arg sequence fill: see above.
+
+
+   .. attribute:: colors
+
+      Meaningful for PDF only: A dictionary of two lists of floats in range *0 <= float <= 1* specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. The stroke color is used for borders and everything that is actively painted or written ("stroked"). The lengths of these lists implicitely determine the colorspaces used: 1 = GRAY, 3 = RGB, 4 = CMYK. So *[1.0, 0.0, 0.0]* stands for RGB color red. Both lists can be *[]* if no color is specified. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
+
+      :rtype: dict
+
+   .. attribute:: border
+
+      Meaningful for PDF only: A dictionary containing border characteristics. It will be *None* for non-PDFs and an empty dictionary if no border information exists. The following keys can occur:
+
+      * *width* -- a float indicating the border thickness in points. The value is -1.0 if no width is specified.
+
+      * *dashes* -- a sequence of integers specifying a line dash pattern. *[]* means no dashes, *[n]* means equal on-off lengths of *n* points, longer lists will be interpreted as specifying alternating on-off length values. See the :ref:`AdobeManual` page 217 for more details.
+
+      * *style* -- 1-byte border style: *S* (Solid) = solid rectangle surrounding the annotation, *D* (Dashed) = dashed rectangle surrounding the link, the dash pattern is specified by the *dashes* entry, *B* (Beveled) = a simulated embossed rectangle that appears to be raised above the surface of the page, *I* (Inset) = a simulated engraved rectangle that appears to be recessed below the surface of the page, *U* (Underline) = a single line along the bottom of the annotation rectangle.
+
+      :rtype: dict
+      
+   .. attribute:: rect
+
+      The area that can be clicked in untransformed coordinates.
+
+      :type: :ref:`Rect`
+
+   .. attribute:: isExternal
+
+      A bool specifying whether the link target is outside of the current document.
+
+      :type: bool
+
+   .. attribute:: uri
+
+      A string specifying the link target. The meaning of this property should be evaluated in conjunction with property *isExternal*. The value may be *None*, in which case *isExternal == False*. If *uri* starts with *file://*, *mailto:*, or an internet resource name, *isExternal* is *True*. In all other cases *isExternal == False* and *uri* points to an internal location. In case of PDF documents, this should either be *#nnnn* to indicate a 1-based (!) page number *nnnn*, or a named location. The format varies for other document types, e.g. *uri = '../FixedDoc.fdoc#PG_2_LNK_1'* for page number 2 (1-based) in an XPS document.
+
+      :type: str
+
+   .. attribute:: xref
+
+      An integer specifying the PDF :data:`xref`. Zero if not a PDF.
+
+      :type: int
+
+   .. attribute:: next
+
+      The next link or *None*.
+
+      :type: *Link*
+
+   .. attribute:: dest
+
+      The link destination details object.
+
+      :type: :ref:`linkDest`
diff --git a/docs/linkdest.rst b/docs/linkdest.rst

new file mode 100644 (file)

index 0000000..ff01b26
--- /dev/null
+++ b/docs/linkdest.rst
@@ -0,0 +1,99 @@
+.. _linkDest:
+
+================
+linkDest
+================
+Class representing the `dest` property of an outline entry or a link. Describes the destination to which such entries point.
+
+=========================== ====================================
+**Attribute**               **Short Description**
+=========================== ====================================
+:attr:`linkDest.dest`       destination
+:attr:`linkDest.fileSpec`   file specification (path, filename)
+:attr:`linkDest.flags`      descriptive flags
+:attr:`linkDest.isMap`      is this a MAP?
+:attr:`linkDest.isUri`      is this a URI?
+:attr:`linkDest.kind`       kind of destination
+:attr:`linkDest.lt`         top left coordinates
+:attr:`linkDest.named`      name if named destination
+:attr:`linkDest.newWindow`  name of new window
+:attr:`linkDest.page`       page number
+:attr:`linkDest.rb`         bottom right coordinates
+:attr:`linkDest.uri`        URI
+=========================== ====================================
+
+**Class API**
+
+.. class:: linkDest
+
+   .. attribute:: dest
+
+      Target destination name if :attr:`linkDest.kind` is :data:`LINK_GOTOR` and :attr:`linkDest.page` is *-1*.
+
+      :type: str
+
+   .. attribute:: fileSpec
+
+      Contains the filename and path this link points to, if :attr:`linkDest.kind` is :data:`LINK_GOTOR` or :data:`LINK_LAUNCH`.
+
+      :type: str
+
+   .. attribute:: flags
+
+      A bitfield describing the validity and meaning of the different aspects of the destination. As far as possible, link destinations are constructed such that e.g. :attr:`linkDest.lt` and :attr:`linkDest.rb` can be treated as defining a bounding box. But the flags indicate which of the values were actually specified, see :ref:`linkDest Flags`.
+
+      :type: int
+
+   .. attribute:: isMap
+
+      This flag specifies whether to track the mouse position when the URI is resolved. Default value: False.
+
+      :type: bool
+
+   .. attribute:: isUri
+
+      Specifies whether this destination is an internet resource (as opposed to e.g. a local file specification in URI format).
+
+      :type: bool
+
+   .. attribute:: kind
+
+      Indicates the type of this destination, like a place in this document, a URI, a file launch, an action or a place in another file. Look at :ref:`linkDest Kinds` to see the names and numerical values.
+
+      :type: int
+
+   .. attribute:: lt
+
+      The top left :ref:`Point` of the destination.
+
+      :type: :ref:`Point`
+
+   .. attribute:: named
+
+      This destination refers to some named action to perform (e.g. a javascript, see :ref:`AdobeManual`). Standard actions provided are *NextPage*, *PrevPage*, *FirstPage*,  and *LastPage*.
+
+      :type: str
+
+   .. attribute:: newWindow
+
+      If true, the destination should be launched in a new window.
+
+      :type: bool
+
+   .. attribute:: page
+
+      The page number (in this or the target document) this destination points to. Only set if :attr:`linkDest.kind` is :data:`LINK_GOTOR` or :data:`LINK_GOTO`. May be *-1* if :attr:`linkDest.kind` is :data:`LINK_GOTOR`. In this case :attr:`linkDest.dest` contains the **name** of a destination in the target document.
+
+      :type: int
+
+   .. attribute:: rb
+
+      The bottom right :ref:`Point` of this destination.
+
+      :type: :ref:`Point`
+
+   .. attribute:: uri
+
+      The name of the URI this destination points to.
+
+      :type: str
diff --git a/docs/lowlevel.rst b/docs/lowlevel.rst

new file mode 100644 (file)

index 0000000..0db7cdc
--- /dev/null
+++ b/docs/lowlevel.rst
@@ -0,0 +1,11 @@
+=================================
+Low Level Functions and Classes
+=================================
+Contains a number of functions and classes for the experienced user. To be used for special needs or performance requirements.
+
+.. toctree::
+   :maxdepth: 1
+
+   functions
+   device
+   coop_low
diff --git a/docs/make-bold.py b/docs/make-bold.py

new file mode 100644 (file)

index 0000000..c809d07
--- /dev/null
+++ b/docs/make-bold.py
@@ -0,0 +1,76 @@
+"""
+Problem: Since MuPDF v1.16 a 'Freetext' annotation font is restricted to the
+"normal" versions (no bold, no italics) of Times-Roman, Helvetica, Courier.
+It is impossible to use PyMuPDF to modify this.
+
+Solution: Using Adobe's JavaScript API, it is possible to manipulate properties
+of Freetext annotations. Check out these references:
+https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/js_api_reference.pdf,
+or https://www.adobe.com/devnet/acrobat/documentation.html.
+
+Function 'this.getAnnots()'  will return all annotations  as an array. We loop
+over this array to set the properties of the text through the 'richContents'
+attribute.
+There is no explicit property to set text to bold, but it is possible to set
+fontWeight=800 (400 is the normal size) of richContents.
+Other attributes, like color, italics, etc. can also be set via richContents.
+
+If we have 'FreeText' annotations created with PyMuPDF, we can make use of this
+JavaScript feature to modify the font - thus circumventing the above restriction.
+
+Use PyMuPDF v1.16.12 to create a push button that executes a Javascript
+containing the desired code. This is what this program does.
+Then open the resulting file with Adobe reader (!).
+After clicking on the button, all Freetext annotations will be bold, and the
+file can be saved.
+If desired, the button can be removed again, using free tools like PyMuPDF or
+PDF XChange editor.
+
+Note / Caution:
+---------------
+The JavaScript will **only** work if the file is opened with Adobe Acrobat reader!
+When using other PDF viewers, the reaction is unforeseeable.
+"""
+import sys
+
+import fitz
+
+# this JavaScript will execute when the button is clicked:
+jscript = """
+var annt = this.getAnnots();
+annt.forEach(function (item, index) {
+    try {
+        var span = item.richContents;
+        span.forEach(function (it, dx) {
+            it.fontWeight = 800;
+        })
+        item.richContents = span;
+    } catch (err) {}
+});
+app.alert('Done');
+"""
+i_fn = sys.argv[1]  # input file name
+o_fn = "bold-" + i_fn  # output filename
+doc = fitz.open(i_fn)  # open input
+page = doc[0]  # get desired page
+
+# ------------------------------------------------
+# make a push button for invoking the JavaScript
+# ------------------------------------------------
+
+widget = fitz.Widget()  # create widget
+
+# make it a 'PushButton'
+widget.field_type = fitz.PDF_WIDGET_TYPE_BUTTON
+widget.field_flags = fitz.PDF_BTN_FIELD_IS_PUSHBUTTON
+
+widget.rect = fitz.Rect(5, 5, 20, 20)  # button position
+
+widget.script = jscript  # fill in JavaScript source text
+widget.field_name = "Make bold"  # arbitrary name
+widget.field_value = "Off"  # arbitrary value
+widget.fill_color = (0, 0, 1)  # make button visible
+
+annot = page.addWidget(widget)  # add the widget to the page
+doc.save(o_fn)  # output the file
+
diff --git a/docs/matrix.rst b/docs/matrix.rst

new file mode 100644 (file)

index 0000000..1426f46
--- /dev/null
+++ b/docs/matrix.rst
@@ -0,0 +1,243 @@
+
+.. _Matrix:
+
+==========
+Matrix
+==========
+
+Matrix is a row-major 3x3 matrix used by image transformations in MuPDF (which complies with the respective concepts laid down in the :ref:`AdobeManual`). With matrices you can manipulate the rendered image of a page in a variety of ways: (parts of) the page can be rotated, zoomed, flipped, sheared and shifted by setting some or all of just six float values.
+
+.. |matrix| image:: images/img-matrix.png
+
+Since all points or pixels live in a two-dimensional space, one column vector of that matrix is a constant unit vector, and only the remaining six elements are used for manipulations. These six elements are usually represented by *[a, b, c, d, e, f]*. Here is how they are positioned in the matrix:
+
+|matrix|
+
+Please note:
+
+    * the below methods are just convenience functions -- everything they do, can also be achieved by directly manipulating the six numerical values
+    * all manipulations can be combined -- you can construct a matrix that rotates **and** shears **and** scales **and** shifts, etc. in one go. If you however choose to do this, do have a look at the **remarks** further down or at the :ref:`AdobeManual`.
+
+================================ ==============================================
+**Method / Attribute**             **Description**
+================================ ==============================================
+:meth:`Matrix.preRotate`         perform a rotation
+:meth:`Matrix.preScale`          perform a scaling
+:meth:`Matrix.preShear`          perform a shearing (skewing)
+:meth:`Matrix.preTranslate`      perform a translation (shifting)
+:meth:`Matrix.concat`            perform a matrix multiplication
+:meth:`Matrix.invert`            calculate the inverted matrix
+:meth:`Matrix.norm`              the Euclidean norm
+:attr:`Matrix.a`                 zoom factor X direction
+:attr:`Matrix.b`                 shearing effect Y direction
+:attr:`Matrix.c`                 shearing effect X direction
+:attr:`Matrix.d`                 zoom factor Y direction
+:attr:`Matrix.e`                 horizontal shift
+:attr:`Matrix.f`                 vertical shift
+:attr:`Matrix.isRectilinear`     true if rect corners will remain rect corners
+================================ ==============================================
+
+**Class API**
+
+.. class:: Matrix
+
+   .. method:: __init__(self)
+
+   .. method:: __init__(self, zoom-x, zoom-y)
+
+   .. method:: __init__(self, shear-x, shear-y, 1)
+
+   .. method:: __init__(self, a, b, c, d, e, f)
+
+   .. method:: __init__(self, matrix)
+
+   .. method:: __init__(self, degree)
+
+   .. method:: __init__(self, sequence)
+
+      Overloaded constructors.
+
+      Without parameters, the zero matrix *Matrix(0.0, 0.0, 0.0, 0.0, 0.0, 0.0)* will be created.
+
+      *zoom-** and *shear-** specify zoom or shear values (float) and create a zoom or shear matrix, respectively.
+
+      For "matrix" a **new copy** of another matrix will be made.
+
+      Float value "degree" specifies the creation of a rotation matrix which rotates anit-clockwise.
+
+      A "sequence" must be any Python sequence object with exactly 6 float entries (see :ref:`SequenceTypes`).
+
+      *fitz.Matrix(1, 1)*, *fitz.Matrix(0.0)* and *fitz.Matrix(fitz.Identity)* create modifyable versions of the :ref:`Identity` matrix, which looks like *[1, 0, 0, 1, 0, 0]*.
+
+   .. method:: norm()
+
+      *(New in version 1.16.0)*
+      
+      Return the Euclidean norm of the matrix as a vector.
+
+   .. method:: preRotate(deg)
+
+      Modify the matrix to perform a counter-clockwise rotation for positive *deg* degrees, else clockwise. The matrix elements of an identity matrix will change in the following way:
+
+      *[1, 0, 0, 1, 0, 0] -> [cos(deg), sin(deg), -sin(deg), cos(deg), 0, 0]*.
+
+      :arg float deg: The rotation angle in degrees (use conventional notation based on Pi = 180 degrees).
+
+   .. method:: preScale(sx, sy)
+
+      Modify the matrix to scale by the zoom factors sx and sy. Has effects on attributes *a* thru *d* only: *[a, b, c, d, e, f] -> [a*sx, b*sx, c*sy, d*sy, e, f]*.
+
+      :arg float sx: Zoom factor in X direction. For the effect see description of attribute *a*.
+
+      :arg float sy: Zoom factor in Y direction. For the effect see description of attribute *d*.
+
+   .. method:: preShear(sx, sy)
+
+      Modify the matrix to perform a shearing, i.e. transformation of rectangles into parallelograms (rhomboids). Has effects on attributes *a* thru *d* only: *[a, b, c, d, e, f] -> [c*sy, d*sy, a*sx, b*sx, e, f]*.
+
+      :arg float sx: Shearing effect in X direction. See attribute *c*.
+
+      :arg float sy: Shearing effect in Y direction. See attribute *b*.
+
+   .. method:: preTranslate(tx, ty)
+
+      Modify the matrix to perform a shifting / translation operation along the x and / or y axis. Has effects on attributes *e* and *f* only: *[a, b, c, d, e, f] -> [a, b, c, d, tx*a + ty*c, tx*b + ty*d]*.
+
+      :arg float tx: Translation effect in X direction. See attribute *e*.
+
+      :arg float ty: Translation effect in Y direction. See attribute *f*.
+
+   .. method:: concat(m1, m2)
+
+      Calculate the matrix product *m1 * m2* and store the result in the current matrix. Any of *m1* or *m2* may be the current matrix. Be aware that matrix multiplication is not commutative. So the sequence of *m1*, *m2* is important.
+
+      :arg m1: First (left) matrix.
+      :type m1: :ref:`Matrix`
+
+      :arg m2: Second (right) matrix.
+      :type m2: :ref:`Matrix`
+
+   .. method:: invert(m = None)
+
+      Calculate the matrix inverse of *m* and store the result in the current matrix. Returns *1* if *m* is not invertible ("degenerate"). In this case the current matrix **will not change**. Returns *0* if *m* is invertible, and the current matrix is replaced with the inverted *m*.
+
+      :arg m: Matrix to be inverted. If not provided, the current matrix will be used.
+      :type m: :ref:`Matrix`
+
+      :rtype: int
+
+   .. attribute:: a
+
+      Scaling in X-direction **(width)**. For example, a value of 0.5 performs a shrink of the **width** by a factor of 2. If a < 0, a left-right flip will (additionally) occur.
+
+      :type: float
+
+   .. attribute:: b
+
+      Causes a shearing effect: each *Point(x, y)* will become *Point(x, y - b*x)*. Therefore, looking from left to right, e.g. horizontal lines will be "tilt" -- downwards if b > 0, upwards otherwise (b is the tangens of the tilting angle).
+
+      :type: float
+
+   .. attribute:: c
+
+      Causes a shearing effect: each *Point(x, y)* will become *Point(x - c*y, y)*. Therefore, looking upwards, vertical lines will be "tilt" -- to the left if c > 0, to the right otherwise (c ist the tangens of the tilting angle).
+
+      :type: float
+
+   .. attribute:: d
+
+      Scaling in Y-direction **(height)**. For example, a value of 1.5 performs a stretch of the **height** by 50%. If d < 0, an up-down flip will (additionally) occur.
+
+      :type: float
+
+   .. attribute:: e
+
+      Causes a horizontal shift effect: Each *Point(x, y)* will become *Point(x + e, y)*. Positive (negative) values of *e* will shift right (left).
+
+      :type: float
+
+   .. attribute:: f
+
+      Causes a vertical shift effect: Each *Point(x, y)* will become *Point(x, y - f)*. Positive (negative) values of *f* will shift down (up).
+
+      :type: float
+
+   .. attribute:: isRectilinear
+
+      Rectilinear means that no shearing is present and that any rotations are integer multiples of 90 degrees. Usually this is used to confirm that (axis-aligned) rectangles before the transformation are still axis-aligned rectangles afterwards.
+
+      :type: bool
+
+.. note::
+
+   * This class adheres to the Python sequence protocol, so components can be accessed via their index, too. Also refer to :ref:`SequenceTypes`.
+   * A matrix can be used with arithmetic operators -- see chapter :ref:`Algebra`.
+   * Changes of matrix properties and execution of matrix methods can be executed consecutively. This is the same as multiplying the respective matrices.
+   * Matrix multiplication is **not commutative** -- changing the execution sequence in general changes the result. So it can quickly become unclear which result a transformation will yield.
+
+To keep results foreseeable for a series of matrix operations, Adobe recommends the following approach (:ref:`AdobeManual`, page 206):
+
+1. Shift ("translate")
+2. Rotate
+3. Scale or shear ("skew")
+
+
+Examples
+-------------
+Here are examples to illustrate some of the effects achievable. The following pictures start with a page of the PDF version of this help file. We show what happens when a matrix is being applied (though always full pages are created, only parts are displayed here to save space).
+
+.. |original| image:: images/img-original.png
+
+This is the original page image:
+
+|original|
+
+Shifting
+------------
+.. |e100| image:: images/img-e-is-100.png
+
+We transform it with a matrix where *e = 100* (right shift by 100 pixels).
+
+|e100|
+
+.. |f100| image:: images/img-f-is-100.png
+
+Next we do a down shift by 100 pixels: *f = 100*.
+
+|f100|
+
+Flipping
+--------------
+.. |aminus1| image:: images/img-a-is--1.png
+
+Flip the page left-right (*a = -1*).
+
+|aminus1|
+
+.. |dminus1| image:: images/img-d-is--1.png
+
+Flip up-down (*d = -1*).
+
+|dminus1|
+
+Shearing
+----------------
+.. |bnull5| image:: images/img-b-is-0.5.png
+
+First a shear in Y direction (*b = 0.5*).
+
+|bnull5|
+
+.. |cnull5| image:: images/img-c-is-0.5.png
+
+Second a shear in X direction (*c = 0.5*).
+
+|cnull5|
+
+Rotating
+---------
+.. |rot60| image:: images/img-rot-60.png
+
+Finally a rotation by 30 clockwise degrees (*preRotate(-30)*).
+
+|rot60|
diff --git a/docs/module.rst b/docs/module.rst

new file mode 100644 (file)

index 0000000..3ff7989
--- /dev/null
+++ b/docs/module.rst
@@ -0,0 +1,411 @@
+.. _Module:
+
+============================
+Using *fitz* as a Module
+============================
+
+.. highlight:: python
+
+*(New in version 1.16.8)*
+
+PyMuPDF can also be used in the command line as a **module** to perform basic utility functions.
+
+This is work in progress and subject to changes. This feature should obsolete writing some of the most basic scripts.
+
+As a guideline we are using the feature set of MuPDF command line tools. Admittedly, there is some functional overlap. On the other hand, PDF embedded files are no longer supported by MuPDF, so PyMuPDF is offering something unique here.
+
+Invocation
+-----------
+
+Invoke the module like this::
+
+    python -m fitz command parameters
+
+General remarks:
+
+* Request help via *"-h"*, resp. command-specific help via *"command -h"*.
+* Parameters may be abbreviated as long as the result is not ambiguous (Python 3.5 or later only).
+* Several commands support parameters *-pages* and *-xrefs*. They are intended for down-selection. Please note that:
+
+    - **page numbers** for this utility must be given **1-based**.
+    - valid :data:`xref` numbers start at 1.
+    - Specify any number of either single integers or integer ranges, separated by one comma each. A **range** is a pair of integers separated by one hyphen "-". Integers must not exceed the maximum page number or resp. :data:`xref` number. To specify that maximum, the symbolic variable "N" may be used instead of an integer. Integers or ranges may occur several times, in any sequence and may overlap. If in a range the first number is greater than the second one, the respective items will be processed in reversed order.
+
+* You can also use the fitz module inside your script::
+
+    >>> from fitz.__main__ import main as fitz_command
+    >>> cmd = "clean input.pdf output.pdf -pages 1,N".split()  # prepare command
+    >>> saved_parms = sys.argv[1:]  # save original parameters
+    >>> sys.argv[1:] = cmd  # store command
+    >>> fitz_command()  # execute command
+    >>> sys.argv[1:] = saved_parms  # restore original parameters
+
+* You can use the following 2-liner and compile it with `Nuitka <https://pypi.org/project/Nuitka/>`_ in either normal or standalone mode, if you want to distribute it. This will give you a command line utility with all the functions explained below::
+
+    from fitz.__main__ import main
+    main()
+
+
+Cleaning and Copying
+----------------------
+
+.. highlight:: text
+
+This command will optimize the PDF and store the result in a new file. You can use it also for encryption, decryption and creating sub documents. It is mostly similar to the MuPDF command line utility *"mutool clean"*::
+
+    python -m fitz clean -h
+    usage: fitz clean [-h] [-password PASSWORD]
+                    [-encryption {keep,none,rc4-40,rc4-128,aes-128,aes-256}]
+                    [-owner OWNER] [-user USER] [-garbage {0,1,2,3,4}]
+                    [-compress] [-ascii] [-linear] [-permission PERMISSION]
+                    [-sanitize] [-pretty] [-pages PAGES]
+                    input output
+
+    -------------- optimize PDF or create sub-PDF if pages given --------------
+
+    positional arguments:
+    input                 PDF filename
+    output                output PDF filename
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -password PASSWORD    password
+    -encryption {keep,none,rc4-40,rc4-128,aes-128,aes-256}
+                          encryption method
+    -owner OWNER          owner password
+    -user USER            user password
+    -garbage {0,1,2,3,4}  garbage collection level
+    -compress             compress (deflate) output
+    -ascii                ASCII encode binary data
+    -linear               format for fast web display
+    -permission PERMISSION
+                          integer with permission levels
+    -sanitize             sanitize / clean contents
+    -pretty               prettify PDF structure
+    -pages PAGES          output selected pages, format: 1,5-7,50-N
+
+If you specify "-pages", be aware that only page-related objects are copied, **no document-level items** like e.g. embedded files.
+
+Please consult :meth:`Document.save` for the parameter meanings.
+
+
+Extracting Fonts and Images
+----------------------------
+Extract fonts or images from selected PDF pages to a desired directory::
+
+    python -m fitz extract -h
+    usage: fitz extract [-h] [-images] [-fonts] [-output OUTPUT] [-password PASSWORD]
+                        [-pages PAGES]
+                        input
+
+    --------------------- extract images and fonts to disk --------------------
+
+    positional arguments:
+    input                 PDF filename
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -images               extract images
+    -fonts                extract fonts
+    -output OUTPUT        output directory, defaults to current
+    -password PASSWORD    password
+    -pages PAGES          only consider these pages, format: 1,5-7,50-N
+
+**Image filenames** are built according to the naming scheme: **"img-xref.ext"**, where "ext" is the extension associated with the image and "xref" the :data:`xref` of the image PDF object.
+
+**Font filenames** consist of the fontname and the associated extension. Any spaces in the fontname are replaced with hyphens "-".
+
+The output directory must already exist.
+
+.. note:: Except for output directory creation, this feature is **functionally equivalent** to and obsoletes `this script <https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/extract-imga.py>`_.
+
+
+Joining PDF Documents
+-----------------------
+To join several PDF files specify::
+
+    python -m fitz join -h
+    usage: fitz join [-h] -output OUTPUT [input [input ...]]
+
+    ---------------------------- join PDF documents ---------------------------
+
+    positional arguments:
+    input           input filenames
+
+    optional arguments:
+    -h, --help      show this help message and exit
+    -output OUTPUT  output filename
+
+    specify each input as 'filename[,password[,pages]]'
+
+
+.. note::
+
+    1. Each input must be entered as **"filename,password,pages"**. Password and pages are optional.
+    2. The password entry **is required** if the "pages" entry is used. If the PDF needs no password, specify two commas.
+    3. The **"pages"** format is the same as explained at the top of this section.
+    4. Each input file is immediately closed after use. Therefore you can use one of them as output filename, and thus overwrite it.
+
+
+Example: To join the following files
+
+1. **file1.pdf:** all pages, back to front, no password
+2. **file2.pdf:** last page, first page, password: "secret"
+3. **file3.pdf:** pages 5 to last, no password
+
+and store the result as **output.pdf** enter this command:
+
+*python -m fitz join -o output.pdf file1.pdf,,N-1 file2.pdf,secret,N,1 file3.pdf,,5-N*
+
+
+Low Level Information
+----------------------
+
+Display PDF internal information. Again, there are similarities to *"mutool show"*::
+
+    python -m fitz show -h
+    usage: fitz show [-h] [-password PASSWORD] [-catalog] [-trailer] [-metadata]
+                    [-xrefs XREFS] [-pages PAGES]
+                    input
+
+    ------------------------- display PDF information -------------------------
+
+    positional arguments:
+    input               PDF filename
+
+    optional arguments:
+    -h, --help          show this help message and exit
+    -password PASSWORD  password
+    -catalog            show PDF catalog
+    -trailer            show PDF trailer
+    -metadata           show PDF metadata
+    -xrefs XREFS        show selected objects, format: 1,5-7,N
+    -pages PAGES        show selected pages, format: 1,5-7,50-N
+
+Examples::
+
+    python -m fitz show x.pdf
+    PDF is password protected
+
+    python -m fitz show x.pdf -pass hugo
+    authentication unsuccessful
+
+    python -m fitz show x.pdf -pass jorjmckie
+    authenticated as owner
+    file 'x.pdf', pages: 1, objects: 19, 58 MB, PDF 1.4, encryption: Standard V5 R6 256-bit AES
+    Document contains 15 embedded files.
+
+    python -m fitz show FDA-1572_508_R6_FINAL.pdf -tr -m
+    'FDA-1572_508_R6_FINAL.pdf', pages: 2, objects: 1645, 1.4 MB, PDF 1.6, encryption: Standard V4 R4 128-bit AES
+    document contains 740 root form fields and is signed
+
+    ------------------------------- PDF metadata ------------------------------
+           format: PDF 1.6
+            title: FORM FDA 1572
+           author: PSC Publishing Services
+          subject: Statement of Investigator
+         keywords: None
+          creator: PScript5.dll Version 5.2.2
+         producer: Acrobat Distiller 9.0.0 (Windows)
+     creationDate: D:20130522104413-04'00'
+          modDate: D:20190718154905-07'00'
+       encryption: Standard V4 R4 128-bit AES
+
+    ------------------------------- PDF trailer -------------------------------
+    <<
+    /DecodeParms <<
+        /Columns 5
+        /Predictor 12
+    >>
+    /Encrypt 1389 0 R
+    /Filter /FlateDecode
+    /ID [ <9252E9E39183F2A0B0C51BE557B8A8FC> <85227BE9B84B724E8F678E1529BA8351> ]
+    /Index [ 1388 258 ]
+    /Info 1387 0 R
+    /Length 253
+    /Prev 1510559
+    /Root 1390 0 R
+    /Size 1646
+    /Type /XRef
+    /W [ 1 3 1 ]
+    >>
+
+Embedded Files Commands
+------------------------
+
+The following commands deal with embedded files -- which is a feature completely removed from MuPDF after v1.14, and hence from all its command line tools.
+
+Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Show the embedded file names (long or short format)::
+
+    python -m fitz embed-info -h
+    usage: fitz embed-info [-h] [-name NAME] [-detail] [-password PASSWORD] input
+
+    --------------------------- list embedded files ---------------------------
+
+    positional arguments:
+    input               PDF filename
+
+    optional arguments:
+    -h, --help          show this help message and exit
+    -name NAME          if given, report only this one
+    -detail             show detail information
+    -password PASSWORD  password
+
+Example::
+
+    python -m fitz embed-info some.pdf
+    'some.pdf' contains the following 15 embedded files.
+
+    20110813_180956_0002.jpg
+    20110813_181009_0003.jpg
+    20110813_181012_0004.jpg
+    20110813_181131_0005.jpg
+    20110813_181144_0006.jpg
+    20110813_181306_0007.jpg
+    20110813_181307_0008.jpg
+    20110813_181314_0009.jpg
+    20110813_181315_0010.jpg
+    20110813_181324_0011.jpg
+    20110813_181339_0012.jpg
+    20110813_181913_0013.jpg
+    insta-20110813_180944_0001.jpg
+    markiert-20110813_180944_0001.jpg
+    neue.datei
+
+Detailed output would look like this per entry::
+
+        name: neue.datei
+    filename: text-tester.pdf
+   ufilename: text-tester.pdf
+        desc: nur zum Testen!
+        size: 4639
+      length: 1566
+
+Extraction
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Extract an embedded file like this::
+
+    python -m fitz embed-extract -h
+    usage: fitz embed-extract [-h] -name NAME [-password PASSWORD] [-output OUTPUT]
+                            input
+
+    ---------------------- extract embedded file to disk ----------------------
+
+    positional arguments:
+    input                 PDF filename
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -name NAME            name of entry
+    -password PASSWORD    password
+    -output OUTPUT        output filename, default is stored name
+
+For details consult :meth:`Document.embeddedFileGet`. Example (refer to previous section)::
+
+    python -m fitz embed-extract some.pdf -name neue.datei
+    Saved entry 'neue.datei' as 'text-tester.pdf'
+
+Deletion
+~~~~~~~~~~~~~~~~~~~~~~~~
+Delete an embedded file like this::
+
+    python -m fitz embed-del -h
+    usage: fitz embed-del [-h] [-password PASSWORD] [-output OUTPUT] -name NAME input
+
+    --------------------------- delete embedded file --------------------------
+
+    positional arguments:
+    input                 PDF filename
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -password PASSWORD    password
+    -output OUTPUT        output PDF filename, incremental save if none
+    -name NAME            name of entry to delete
+
+For details consult :meth:`Document.embeddedFileDel`.
+
+Insertion
+~~~~~~~~~~~~~~~~~~~~~~~~
+Add a new embedded file using this command::
+
+    python -m fitz embed-add -h
+    usage: fitz embed-add [-h] [-password PASSWORD] [-output OUTPUT] -name NAME -path
+                        PATH [-desc DESC]
+                        input
+
+    ---------------------------- add embedded file ----------------------------
+
+    positional arguments:
+    input                 PDF filename
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -password PASSWORD    password
+    -output OUTPUT        output PDF filename, incremental save if none
+    -name NAME            name of new entry
+    -path PATH            path to data for new entry
+    -desc DESC            description of new entry
+
+*"NAME"* **must not** already exist in the PDF. For details consult :meth:`Document.embeddedFileAdd`.
+
+Updates
+~~~~~~~~~~~~~~~~~~~~~~~
+Update an existing embedded file using this command::
+
+    python -m fitz embed-upd -h
+    usage: fitz embed-upd [-h] -name NAME [-password PASSWORD] [-output OUTPUT]
+                        [-path PATH] [-filename FILENAME] [-ufilename UFILENAME]
+                        [-desc DESC]
+                        input
+
+    --------------------------- update embedded file --------------------------
+
+    positional arguments:
+    input                 PDF filename
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -name NAME            name of entry
+    -password PASSWORD    password
+    -output OUTPUT        Output PDF filename, incremental save if none
+    -path PATH            path to new data for entry
+    -filename FILENAME    new filename to store in entry
+    -ufilename UFILENAME  new unicode filename to store in entry
+    -desc DESC            new description to store in entry
+
+    except '-name' all parameters are optional
+
+Use this method to change meta-information of the file -- just omit the *"PATH"*. For details consult :meth:`Document.embeddedFileUpd`.
+
+
+Copying
+~~~~~~~~~~~~~~~~~~~~~~~
+Copy embedded files between PDFs::
+
+    python -m fitz embed-copy -h
+    usage: fitz embed-copy [-h] [-password PASSWORD] [-output OUTPUT] -source
+                        SOURCE [-pwdsource PWDSOURCE]
+                        [-name [NAME [NAME ...]]]
+                        input
+
+    --------------------- copy embedded files between PDFs --------------------
+
+    positional arguments:
+    input                 PDF to receive embedded files
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -password PASSWORD    password of input
+    -output OUTPUT        output PDF, incremental save to 'input' if omitted
+    -source SOURCE        copy embedded files from here
+    -pwdsource PWDSOURCE  password of 'source' PDF
+    -name [NAME [NAME ...]]
+                          restrict copy to these entries
+
+
+.. highlight:: python
diff --git a/docs/multiprocess-gui.py b/docs/multiprocess-gui.py

new file mode 100644 (file)

index 0000000..0cfa7d0
--- /dev/null
+++ b/docs/multiprocess-gui.py
@@ -0,0 +1,167 @@
+"""
+Created on 2019-05-01
+
+@author: yinkaisheng@live.com
+@copyright: 2019 yinkaisheng@live.com
+@license: GNU GPL 3.0+
+
+Demonstrate the use of multiprocessing with PyMuPDF
+-----------------------------------------------------
+This example shows some more advanced use of multiprocessing.
+The main process show a Qt GUI and establishes a 2-way communication with
+another process, which accesses a supported document.
+"""
+import os
+import sys
+import time
+import multiprocessing as mp
+import queue
+import fitz
+from PyQt5 import QtCore, QtGui, QtWidgets
+
+my_timer = time.clock if str is bytes else time.perf_counter
+
+
+class DocForm(QtWidgets.QWidget):
+    def __init__(self):
+        super().__init__()
+        self.process = None
+        self.queNum = mp.Queue()
+        self.queDoc = mp.Queue()
+        self.pageCount = 0
+        self.curPageNum = 0
+        self.lastDir = ""
+        self.timerSend = QtCore.QTimer(self)
+        self.timerSend.timeout.connect(self.onTimerSendPageNum)
+        self.timerGet = QtCore.QTimer(self)
+        self.timerGet.timeout.connect(self.onTimerGetPage)
+        self.timerWaiting = QtCore.QTimer(self)
+        self.timerWaiting.timeout.connect(self.onTimerWaiting)
+        self.initUI()
+
+    def initUI(self):
+        vbox = QtWidgets.QVBoxLayout()
+        self.setLayout(vbox)
+
+        hbox = QtWidgets.QHBoxLayout()
+        self.btnOpen = QtWidgets.QPushButton("OpenDocument", self)
+        self.btnOpen.clicked.connect(self.openDoc)
+        hbox.addWidget(self.btnOpen)
+
+        self.btnPlay = QtWidgets.QPushButton("PlayDocument", self)
+        self.btnPlay.clicked.connect(self.playDoc)
+        hbox.addWidget(self.btnPlay)
+
+        self.btnStop = QtWidgets.QPushButton("Stop", self)
+        self.btnStop.clicked.connect(self.stopPlay)
+        hbox.addWidget(self.btnStop)
+
+        self.label = QtWidgets.QLabel("0/0", self)
+        self.label.setFont(QtGui.QFont("Verdana", 20))
+        hbox.addWidget(self.label)
+
+        vbox.addLayout(hbox)
+
+        self.labelImg = QtWidgets.QLabel("Document", self)
+        sizePolicy = QtWidgets.QSizePolicy(
+            QtWidgets.QSizePolicy.Preferred, QtWidgets.QSizePolicy.Expanding
+        )
+        self.labelImg.setSizePolicy(sizePolicy)
+        vbox.addWidget(self.labelImg)
+
+        self.setGeometry(100, 100, 400, 600)
+        self.setWindowTitle("PyMuPDF Document Player")
+        self.show()
+
+    def openDoc(self):
+        path, _ = QtWidgets.QFileDialog.getOpenFileName(
+            self,
+            "Open Document",
+            self.lastDir,
+            "All Supported Files (*.pdf;*.epub;*.xps;*.oxps;*.cbz;*.fb2);;PDF Files (*.pdf);;EPUB Files (*.epub);;XPS Files (*.xps);;OpenXPS Files (*.oxps);;CBZ Files (*.cbz);;FB2 Files (*.fb2)",
+            options=QtWidgets.QFileDialog.Options(),
+        )
+        if path:
+            self.lastDir, self.file = os.path.split(path)
+            if self.process:
+                self.queNum.put(-1)  # use -1 to notify the process to exit
+            self.timerSend.stop()
+            self.curPageNum = 0
+            self.pageCount = 0
+            self.process = mp.Process(
+                target=openDocInProcess, args=(path, self.queNum, self.queDoc)
+            )
+            self.process.start()
+            self.timerGet.start(40)
+            self.label.setText("0/0")
+            self.queNum.put(0)
+            self.startTime = time.perf_counter()
+            self.timerWaiting.start(40)
+
+    def playDoc(self):
+        self.timerSend.start(500)
+
+    def stopPlay(self):
+        self.timerSend.stop()
+
+    def onTimerSendPageNum(self):
+        if self.curPageNum < self.pageCount - 1:
+            self.queNum.put(self.curPageNum + 1)
+        else:
+            self.timerSend.stop()
+
+    def onTimerGetPage(self):
+        try:
+            ret = self.queDoc.get(False)
+            if isinstance(ret, int):
+                self.timerWaiting.stop()
+                self.pageCount = ret
+                self.label.setText("{}/{}".format(self.curPageNum + 1, self.pageCount))
+            else:  # tuple, pixmap info
+                num, samples, width, height, stride, alpha = ret
+                self.curPageNum = num
+                self.label.setText("{}/{}".format(self.curPageNum + 1, self.pageCount))
+                fmt = (
+                    QtGui.QImage.Format_RGBA8888
+                    if alpha
+                    else QtGui.QImage.Format_RGB888
+                )
+                qimg = QtGui.QImage(samples, width, height, stride, fmt)
+                self.labelImg.setPixmap(QtGui.QPixmap.fromImage(qimg))
+        except queue.Empty as ex:
+            pass
+
+    def onTimerWaiting(self):
+        self.labelImg.setText(
+            'Loading "{}", {:.2f}s'.format(
+                self.file, time.perf_counter() - self.startTime
+            )
+        )
+
+    def closeEvent(self, event):
+        self.queNum.put(-1)
+        event.accept()
+
+
+def openDocInProcess(path, queNum, quePageInfo):
+    start = my_timer()
+    doc = fitz.open(path)
+    end = my_timer()
+    quePageInfo.put(doc.pageCount)
+    while True:
+        num = queNum.get()
+        if num < 0:
+            break
+        page = doc.loadPage(num)
+        pix = page.getPixmap()
+        quePageInfo.put(
+            (num, pix.samples, pix.width, pix.height, pix.stride, pix.alpha)
+        )
+    doc.close()
+    print("process exit")
+
+
+if __name__ == "__main__":
+    app = QtWidgets.QApplication(sys.argv)
+    form = DocForm()
+    sys.exit(app.exec_())
diff --git a/docs/multiprocess-render.py b/docs/multiprocess-render.py

new file mode 100644 (file)

index 0000000..09df515
--- /dev/null
+++ b/docs/multiprocess-render.py
@@ -0,0 +1,79 @@
+"""
+Demonstrate the use of multiprocessing with PyMuPDF.
+
+Depending on the  number of CPUs, the document is divided in page ranges.
+Each range is then worked on by one process.
+The type of work would typically be text extraction or page rendering. Each
+process must know where to put its results, because this processing pattern
+does not include inter-process communication or data sharing.
+
+Compared to sequential processing, speed improvements in range of 100% (ie.
+twice as fast) or better can be expected.
+"""
+from __future__ import print_function, division
+import sys
+import os
+import time
+from multiprocessing import Pool, cpu_count
+import fitz
+
+# choose a version specific timer function (bytes == str in Python 2)
+mytime = time.clock if str is bytes else time.perf_counter
+
+
+def render_page(vector):
+    """ Render a page range of a document.
+
+    Notes:
+        The PyMuPDF document cannot be part of the argument, because that
+        cannot be pickled. So we are being passed in just its filename.
+        This is no performance issue, because we are a separate process and
+        need to open the document anyway.
+        Any page-specific function can be processed here - rendering is just
+        an example - text extraction might be another.
+        The work must however be self-contained: no inter-process communication
+        or synchronization is possible with this design.
+        Care must also be taken with which parameters are contained in the
+        argument, because it will be passed in via pickling by the Pool class.
+        So any large objects will increase the overall duration.
+    Args:
+        vector: a list containing required parameters.
+    """
+    # recreate the arguments
+    idx = vector[0]  # this is the segment number we have to process
+    cpu = vector[1]  # number of CPUs
+    filename = vector[2]  # document filename
+    mat = vector[3]  # the matrix for rendering
+    doc = fitz.open(filename)  # open the document
+    num_pages = len(doc)  # get number of pages
+
+    # pages per segment: make sure that cpu * seg_size >= num_pages!
+    seg_size = int(num_pages / cpu + 1)
+    seg_from = idx * seg_size  # our first page number
+    seg_to = min(seg_from + seg_size, num_pages)  # last page number
+
+    for i in range(seg_from, seg_to):  # work through our page segment
+        page = doc[i]
+        # page.getText("rawdict")  # use any page-related type of work here, eg
+        pix = page.getPixmap(alpha=False, matrix=mat)
+        # store away the result somewhere ...
+        # pix.writePNG("p-%i.png" % i)
+    print("Processed page numbers %i through %i" % (seg_from, seg_to - 1))
+
+
+if __name__ == "__main__":
+    t0 = mytime()  # start a timer
+    filename = sys.argv[1]
+    mat = fitz.Matrix(0.2, 0.2)  # the rendering matrix: scale down to 20%
+    cpu = cpu_count()
+
+    # make vectors of arguments for the processes
+    vectors = [(i, cpu, filename, mat) for i in range(cpu)]
+    print("Starting %i processes for '%s'." % (cpu, filename))
+
+    pool = Pool()  # make pool of 'cpu_count()' processes
+    pool.map(render_page, vectors, 1)  # start processes passing each a vector
+
+    t1 = mytime()  # stop the timer
+    print("Total time %g seconds" % round(t1 - t0, 2))
+
diff --git a/docs/new-annots.py b/docs/new-annots.py

new file mode 100644 (file)

index 0000000..11a38d3
--- /dev/null
+++ b/docs/new-annots.py
@@ -0,0 +1,172 @@
+# -*- coding: utf-8 -*-
+"""
+-------------------------------------------------------------------------------
+Demo script showing how annotations can be added to a PDF using PyMuPDF.
+
+It contains the following annotation types:
+Caret, Text, FreeText, text markers (underline, strike-out, highlight,
+squiggle), Circle, Square, Line, PolyLine, Polygon, FileAttachment, Stamp
+and Redaction.
+There is some effort to vary appearances by adding colors, line ends,
+opacity, rotation, dashed lines, etc.
+
+Dependencies
+------------
+PyMuPDF v1.17.0
+-------------------------------------------------------------------------------
+"""
+from __future__ import print_function
+
+import gc
+import os
+import sys
+
+import fitz
+
+print(fitz.__doc__)
+if fitz.VersionBind.split(".") < ["1", "17", "0"]:
+    sys.exit("PyMuPDF v1.17.0+ is needed.")
+
+gc.set_debug(gc.DEBUG_UNCOLLECTABLE)
+
+highlight = "this text is highlighted"
+underline = "this text is underlined"
+strikeout = "this text is striked out"
+squiggled = "this text is zigzag-underlined"
+red = (1, 0, 0)
+blue = (0, 0, 1)
+gold = (1, 1, 0)
+green = (0, 1, 0)
+
+displ = fitz.Rect(0, 50, 0, 50)
+r = fitz.Rect(72, 72, 220, 100)
+t1 = u"têxt üsès Lätiñ charß,\nEUR: €, mu: µ, super scripts: ²³!"
+
+
+def print_descr(annot):
+    """Print a short description to the right of each annot rect."""
+    annot.parent.insertText(
+        annot.rect.br + (10, -5), "%s annotation" % annot.type[1], color=red
+    )
+
+
+doc = fitz.open()
+page = doc.newPage()
+
+page.setRotation(0)
+
+annot = page.addCaretAnnot(r.tl)
+print_descr(annot)
+
+r = r + displ
+annot = page.addFreetextAnnot(
+    r,
+    t1,
+    fontsize=10,
+    rotate=90,
+    text_color=blue,
+    fill_color=gold,
+    align=fitz.TEXT_ALIGN_CENTER,
+)
+annot.setBorder(width=0.3, dashes=[2])
+annot.update(text_color=blue, fill_color=gold)
+
+print_descr(annot)
+r = annot.rect + displ
+
+annot = page.addTextAnnot(r.tl, t1)
+print_descr(annot)
+
+# Adding text marker annotations:
+# first insert a unique text, then search for it, then mark it
+pos = annot.rect.tl + displ.tl
+page.insertText(
+    pos,  # insertion point
+    highlight,  # inserted text
+    morph=(pos, fitz.Matrix(-5)),  # rotate around insertion point
+)
+rl = page.searchFor(highlight, quads=True)  # need a quad b/o tilted text
+annot = page.addHighlightAnnot(rl[0])
+print_descr(annot)
+pos = annot.rect.bl  # next insertion point
+
+page.insertText(pos, underline, morph=(pos, fitz.Matrix(-10)))
+rl = page.searchFor(underline, quads=True)
+annot = page.addUnderlineAnnot(rl[0])
+print_descr(annot)
+pos = annot.rect.bl
+
+page.insertText(pos, strikeout, morph=(pos, fitz.Matrix(-15)))
+rl = page.searchFor(strikeout, quads=True)
+annot = page.addStrikeoutAnnot(rl[0])
+print_descr(annot)
+pos = annot.rect.bl
+
+page.insertText(pos, squiggled, morph=(pos, fitz.Matrix(-20)))
+rl = page.searchFor(squiggled, quads=True)
+annot = page.addSquigglyAnnot(rl[0])
+print_descr(annot)
+pos = annot.rect.bl
+
+r = fitz.Rect(pos, pos.x + 75, pos.y + 35) + (0, 20, 0, 20)
+annot = page.addPolylineAnnot([r.bl, r.tr, r.br, r.tl])  # 'Polyline'
+annot.setBorder(width=0.3, dashes=[2])
+annot.setColors(stroke=blue, fill=green)
+annot.setLineEnds(fitz.PDF_ANNOT_LE_CLOSED_ARROW, fitz.PDF_ANNOT_LE_R_CLOSED_ARROW)
+annot.update(fill_color=(1, 1, 0))
+print_descr(annot)
+
+r += displ
+annot = page.addPolygonAnnot([r.bl, r.tr, r.br, r.tl])  # 'Polygon'
+annot.setBorder(width=0.3, dashes=[2])
+annot.setColors(stroke=blue, fill=gold)
+annot.setLineEnds(fitz.PDF_ANNOT_LE_DIAMOND, fitz.PDF_ANNOT_LE_CIRCLE)
+annot.update()
+print_descr(annot)
+
+r += displ
+annot = page.addLineAnnot(r.tr, r.bl)  # 'Line'
+annot.setBorder(width=0.3, dashes=[2])
+annot.setColors(stroke=blue, fill=gold)
+annot.setLineEnds(fitz.PDF_ANNOT_LE_DIAMOND, fitz.PDF_ANNOT_LE_CIRCLE)
+annot.update()
+print_descr(annot)
+
+r += displ
+annot = page.addRectAnnot(r)  # 'Square'
+annot.setBorder(width=1, dashes=[1, 2])
+annot.setColors(stroke=blue, fill=gold)
+annot.update(opacity=0.5)
+print_descr(annot)
+
+r += displ
+annot = page.addCircleAnnot(r)  # 'Circle'
+annot.setBorder(width=0.3, dashes=[2])
+annot.setColors(stroke=blue, fill=gold)
+annot.update()
+print_descr(annot)
+
+r += displ
+annot = page.addFileAnnot(
+    r.tl, b"just anything for testing", "testdata.txt"  # 'FileAttachment'
+)
+print_descr(annot)  # annot.rect
+
+r += displ
+annot = page.addStampAnnot(r, stamp=10)  # 'Stamp'
+annot.setColors(stroke=green)
+annot.update()
+print_descr(annot)
+
+r += displ + (0, 0, 50, 10)
+rc = page.insertTextbox(
+    r,
+    "This content will be removed upon applying the redaction.",
+    color=blue,
+    align=fitz.TEXT_ALIGN_CENTER,
+)
+annot = page.addRedactAnnot(r)
+print_descr(annot)
+
+outfile = os.path.abspath(__file__).replace(".py", "-%i.pdf" % page.rotation)
+doc.save(outfile, deflate=True)
diff --git a/docs/outline.rst b/docs/outline.rst

new file mode 100644 (file)

index 0000000..c72bf3c
--- /dev/null
+++ b/docs/outline.rst
@@ -0,0 +1,73 @@
+.. _Outline:
+
+================
+Outline
+================
+
+*outline* (or "bookmark"), is a property of *Document*. If not *None*, it stands for the first outline item of the document. Its properties in turn define the characteristics of this item and also point to other outline items in "horizontal" or downward direction. The full tree of all outline items for e.g. a conventional table of contents (TOC) can be recovered by following these "pointers".
+
+============================ ==================================================
+**Method / Attribute**       **Short Description**
+============================ ==================================================
+:attr:`Outline.down`         next item downwards
+:attr:`Outline.next`         next item same level
+:attr:`Outline.page`         page number (0-based)
+:attr:`Outline.title`        title
+:attr:`Outline.uri`          string further specifying the outline target
+:attr:`Outline.isExternal`   target is outside this document
+:attr:`Outline.is_open`      whether sub-outlines are open or collapsed
+:attr:`Outline.isOpen`       whether sub-outlines are open or collapsed
+:attr:`Outline.dest`         points to link destination details
+============================ ==================================================
+
+**Class API**
+
+.. class:: Outline
+
+   .. attribute:: down
+
+      The next outline item on the next level down. Is *None* if the item has no kids.
+
+      :type: :ref:`Outline`
+
+   .. attribute:: next
+
+      The next outline item at the same level as this item. Is *None* if this is the last one in its level.
+
+      :type: `Outline`
+
+   .. attribute:: page
+
+      The page number (0-based) this bookmark points to.
+
+      :type: int
+
+   .. attribute:: title
+
+      The item's title as a string or *None*.
+
+      :type: str
+
+   .. attribute:: is_open
+
+      Or *isOpen* -- an indicator showing whether any sub-outlines should be expanded (*True*) or be collapsed (*False*). This information should be interpreted by PDF display software accordingly.
+
+      :type: bool
+
+   .. attribute:: isExternal
+
+      A bool specifying whether the target is outside (*True*) of the current document.
+
+      :type: bool
+
+   .. attribute:: uri
+
+      A string specifying the link target. The meaning of this property should be evaluated in conjunction with *isExternal*. The value may be *None*, in which case *isExternal == False*. If *uri* starts with *file://*, *mailto:*, or an internet resource name, *isExternal* is *True*. In all other cases *isExternal == False* and *uri* points to an internal location. In case of PDF documents, this should either be *#nnnn* to indicate a 1-based (!) page number *nnnn*, or a named location. The format varies for other document types, e.g. *uri = '../FixedDoc.fdoc#PG_21_LNK_84'* for page number 21 (1-based) in an XPS document.
+
+      :type: str
+
+   .. attribute:: dest
+
+      The link destination details object.
+
+      :type: :ref:`linkDest`
diff --git a/docs/page.rst b/docs/page.rst

new file mode 100644 (file)

index 0000000..d1189d8
--- /dev/null
+++ b/docs/page.rst
@@ -0,0 +1,1236 @@
+.. _Page:
+
+================
+Page
+================
+
+Class representing a document page. A page object is created by :meth:`Document.loadPage` or, equivalently, via indexing the document like *doc[n]* - it has no independent constructor.
+
+There is a parent-child relationship between a document and its pages. If the document is closed or deleted, all page objects (and their respective children, too) in existence will become unusable ("orphaned"): If a page property or method is being used, an exception is raised.
+
+Several page methods have a :ref:`Document` counterpart for convenience. At the end of this chapter you will find a synopsis.
+
+Modifying Pages
+---------------
+Changing page properties and adding or changing page content is available for PDF documents only.
+
+In a nutshell, this is what you can do with PyMuPDF:
+
+* Modify page rotation and the visible part ("CropBox") of the page.
+* Insert images, other PDF pages, text and simple geometrical objects.
+* Add annotations and form fields.
+
+.. note::
+
+   Methods require coordinates (points, rectangles) to put content in desired places. Please be aware that since v1.17.0 these coordinates **must always** be provided relative to the **unrotated** page. The reverse is also true: expcept :attr:`Page.rect`, resp. :meth:`Page.bound` (both *reflect* when the page is rotated), all coordinates returned by methods and attributes pertain to the unrotated page.
+
+   So the returned value of e.g. :meth:`Page.getImageBbox` will not change if you do a :meth:`Page.setRotation`. The same is true for coordinates returned by :meth:`Page.getText`, annotation rectangles, and so on. If you want to find out, where an object is located in **rotated coordinates**, multiply the coordinates with :attr:`Page.rotationMatrix`. There also is its inverse, :attr:`Page.derotationMatrix`, which you can use when interfacing with other readers, which may behave differently in this respect.
+
+.. note::
+
+   If you add or update annotations, links or form fields on the page and immediately afterwards need to work with them (i.e. **without leaving the page**), you should reload the page using :meth:`Document.reload_page` before referring to these new or updated items.
+
+   This ensures all your changes have been fully applied to PDF structures, so can safely create Pixmaps or successfully iterate over annotations, links and form fields.
+
+================================= =======================================================
+**Method / Attribute**            **Short Description**
+================================= =======================================================
+:meth:`Page.addCaretAnnot`        PDF only: add a caret annotation
+:meth:`Page.addCircleAnnot`       PDF only: add a circle annotation
+:meth:`Page.addFileAnnot`         PDF only: add a file attachment annotation
+:meth:`Page.addFreetextAnnot`     PDF only: add a text annotation
+:meth:`Page.addHighlightAnnot`    PDF only: add a "highlight" annotation
+:meth:`Page.addInkAnnot`          PDF only: add an ink annotation
+:meth:`Page.addLineAnnot`         PDF only: add a line annotation
+:meth:`Page.addPolygonAnnot`      PDF only: add a polygon annotation
+:meth:`Page.addPolylineAnnot`     PDF only: add a multi-line annotation
+:meth:`Page.addRectAnnot`         PDF only: add a rectangle annotation
+:meth:`Page.addRedactAnnot`       PDF only: add a redaction annotation
+:meth:`Page.addSquigglyAnnot`     PDF only: add a "squiggly" annotation
+:meth:`Page.addStampAnnot`        PDF only: add a "rubber stamp" annotation
+:meth:`Page.addStrikeoutAnnot`    PDF only: add a "strike-out" annotation
+:meth:`Page.addTextAnnot`         PDF only: add a comment
+:meth:`Page.addUnderlineAnnot`    PDF only: add an "underline" annotation
+:meth:`Page.addWidget`            PDF only: add a PDF Form field
+:meth:`Page.annot_names`          PDF only: a list of annotation and widget names
+:meth:`Page.annots`               return a generator over the annots on the page
+:meth:`Page.apply_redactions`     PDF olny: process the redactions of the page
+:meth:`Page.bound`                rectangle of the page
+:meth:`Page.deleteAnnot`          PDF only: delete an annotation
+:meth:`Page.deleteLink`           PDF only: delete a link
+:meth:`Page.drawBezier`           PDF only: draw a cubic Bezier curve
+:meth:`Page.drawCircle`           PDF only: draw a circle
+:meth:`Page.drawCurve`            PDF only: draw a special Bezier curve
+:meth:`Page.drawLine`             PDF only: draw a line
+:meth:`Page.drawOval`             PDF only: draw an oval / ellipse
+:meth:`Page.drawPolyline`         PDF only: connect a point sequence
+:meth:`Page.drawRect`             PDF only: draw a rectangle
+:meth:`Page.drawSector`           PDF only: draw a circular sector
+:meth:`Page.drawSquiggle`         PDF only: draw a squiggly line
+:meth:`Page.drawZigzag`           PDF only: draw a zig-zagged line
+:meth:`Page.getFontList`          PDF only: get list of used fonts
+:meth:`Page.getImageBbox`         PDF only: get bbox of embedded image
+:meth:`Page.getImageList`         PDF only: get list of used images
+:meth:`Page.getLinks`             get all links
+:meth:`Page.getPixmap`            create a page image in raster format
+:meth:`Page.getSVGimage`          create a page image in SVG format
+:meth:`Page.getText`              extract the page's text
+:meth:`Page.getTextPage`          create a TextPage for the page
+:meth:`Page.insertFont`           PDF only: insert a font for use by the page
+:meth:`Page.insertImage`          PDF only: insert an image
+:meth:`Page.insertLink`           PDF only: insert a link
+:meth:`Page.insertText`           PDF only: insert text
+:meth:`Page.insertTextbox`        PDF only: insert a text box
+:meth:`Page.links`                return a generator of the links on the page
+:meth:`Page.loadAnnot`            PDF only: load a specific annotation
+:meth:`Page.loadLinks`            return the first link on a page
+:meth:`Page.newShape`             PDF only: create a new :ref:`Shape`
+:meth:`Page.searchFor`            search for a string
+:meth:`Page.setCropBox`           PDF only: modify the visible page
+:meth:`Page.setMediaBox`          PDF only: modify the mediabox
+:meth:`Page.setRotation`          PDF only: set page rotation
+:meth:`Page.showPDFpage`          PDF only: display PDF page image
+:meth:`Page.updateLink`           PDF only: modify a link
+:meth:`Page.widgets`              return a generator over the fields on the page
+:meth:`Page.writeText`            write one or more :ref:`Textwriter` objects
+:attr:`Page.CropBox`              the page's :data:`CropBox`
+:attr:`Page.CropBoxPosition`      displacement of the :data:`CropBox`
+:attr:`Page.firstAnnot`           first :ref:`Annot` on the page
+:attr:`Page.firstLink`            first :ref:`Link` on the page
+:attr:`Page.firstWidget`          first widget (form field) on the page
+:attr:`Page.MediaBox`             the page's :data:`MediaBox`
+:attr:`Page.MediaBoxSize`         bottom-right point of :data:`MediaBox`
+:attr:`Page.derotationMatrix`     PDF only: get coordinates in unrotated page space
+:attr:`Page.rotationMatrix`       PDF only: get coordinates in rotated page space
+:attr:`Page.transformationMatrix` PDF only: translate between PDF and MuPDF space
+:attr:`Page.number`               page number
+:attr:`Page.parent`               owning document object
+:attr:`Page.rect`                 rectangle of the page
+:attr:`Page.rotation`             PDF only: page rotation
+:attr:`Page.xref`                 PDF only: page :data:`xref`
+================================= =======================================================
+
+**Class API**
+
+.. class:: Page
+
+   .. method:: bound()
+
+      Determine the rectangle of the page. Same as property :attr:`Page.rect` below. For PDF documents this **usually** also coincides with :data:`MediaBox` and :data:`CropBox`, but not always. For example, if the page is rotated, then this is reflected by this method -- the :attr:`Page.CropBox` however will not change.
+
+      :rtype: :ref:`Rect`
+
+   .. method:: addCaretAnnot(point)
+
+      *(New in version 1.16.0)*
+      
+      PDF only: Add a caret icon. A caret annotation is a visual symbol normally used to indicate the presence of text edits on the page.
+
+      :arg point_like point: the top left point of a 20 x 20 rectangle containing the MuPDF-provided icon.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation.
+
+      .. image:: images/img-caret-annot.jpg
+         :scale: 70
+
+   .. method:: addTextAnnot(point, text, icon="Note")
+
+      PDF only: Add a comment icon ("sticky note") with accompanying text. Only the icon is visible, the accompanying text is hidden and can be visualized by many PDF viewers by hovering the mouse over the symbol.
+
+      :arg point_like point: the top left point of a 20 x 20 rectangle containing the MuPDF-provided "note" icon.
+
+      :arg str text: the commentary text. This will be shown on double clicking or hovering over the icon. May contain any Latin characters.
+      :arg str icon: *(new in version 1.16.0)* choose one of "Note" (default), "Comment", "Help", "Insert", "Key", "NewParagraph", "Paragraph" as the visual symbol for the embodied text [#f4]_.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation.
+
+   .. index::
+      pair: color; addFreetextAnnot
+      pair: fontname; addFreetextAnnot
+      pair: fontsize; addFreetextAnnot
+      pair: rect; addFreetextAnnot
+      pair: rotate; addFreetextAnnot
+      pair: align; addFreetextAnnot
+
+   .. method:: addFreetextAnnot(rect, text, fontsize=12, fontname="helv", text_color=0, fill_color=1, rotate=0, align=TEXT_ALIGN_LEFT)
+
+      PDF only: Add text in a given rectangle.
+
+      :arg rect_like rect: the rectangle into which the text should be inserted. Text is automatically wrapped to a new line at box width. Lines not fitting into the box will be invisible.
+
+      :arg str text: the text. *(New in v1.17.0)* May contain any mixture of Latin, Greek, Cyrillic, Chinese, Japanese and Korean characters. The respective required font is automatically determined.
+      :arg float fontsize: the font size. Default is 12.
+      :arg str fontname: the font name. Default is "Helv". Accepted alternatives are "Cour", "TiRo", "ZaDb" and "Symb". The name may be abbreviated to the first two characters, like "Co" for "Cour". Lower case is also accepted. *(Changed in v1.16.0)* Bold or italic variants of the fonts are **no longer accepted**. A user-contributed script provides a circumvention for this restriction -- see section *Using Buttons and JavaScript* in chapter :ref:`FAQ`. *(New in v1.17.0)* The actual font to use is now determined on a by-character level, and all required fonts (or sub-fonts) are automatically included. Therefore, you should rarely ever need to care about this parameter and let it default (except you insist on a serifed font for your non-CJK text parts).
+      :arg sequence,float text_color: *(new in version 1.16.0)* the text color. Default is black.
+
+      :arg sequence,float fill_color: *(new in version 1.16.0)* the fill color. Default is white.
+      :arg int align: *(new in version 1.17.0)* text alignment, one of TEXT_ALIGN_LEFT, TEXT_ALIGN_CENTER, TEXT_ALIGN_RIGHT - justify is not supported.
+
+
+      :arg int rotate: the text orientation. Accepted values are 0, 90, 270, invalid entries are set to zero.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation. Color properties **can only be changed** using special parameters of :meth:`Annot.update`. There, you can also set a border color different from the text color.
+
+   .. method:: addFileAnnot(pos, buffer, filename, ufilename=None, desc=None, icon="PushPin")
+
+      PDF only: Add a file attachment annotation with a "PushPin" icon at the specified location.
+
+      :arg point_like pos: the top-left point of a 18x18 rectangle containing the MuPDF-provided "PushPin" icon.
+
+      :arg bytes,bytearray,BytesIO buffer: the data to be stored (actual file content, any data, etc.).
+
+         Changed in version 1.14.13 *io.BytesIO* is now also supported.
+
+      :arg str filename: the filename to associate with the data.
+      :arg str ufilename: the optional PDF unicode version of filename. Defaults to filename.
+      :arg str desc: an optional description of the file. Defaults to filename.
+      :arg str icon: *(new in version 1.16.0)* choose one of "PushPin" (default), "Graph", "Paperclip", "Tag" as the visual symbol for the attached data [#f4]_.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation. Use methods of :ref:`Annot` to make any changes.
+
+   .. method:: addInkAnnot(list)
+
+      PDF only: Add a "freehand" scribble annotation.
+
+      :arg sequence list: a list of one or more lists, each containing :data:`point_like` items. Each item in these sublists is interpreted as a :ref:`Point` through which a connecting line is drawn. Separate sublists thus represent separate drawing lines.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation in default appearance (black line of width 1). Use annotation methods with a subsequent :meth:`Annot.update` to modify.
+
+   .. method:: addLineAnnot(p1, p2)
+
+      PDF only: Add a line annotation.
+
+      :arg point_like p1: the starting point of the line.
+
+      :arg point_like p2: the end point of the line.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation. It is drawn with line color black and line width 1. The **rectangle** is automatically created to contain both points, each one surrounded by a circle of radius 3 * line width to make room for any line end symbols.
+
+   .. method:: addRectAnnot(rect)
+
+   .. method:: addCircleAnnot(rect)
+
+      PDF only: Add a rectangle, resp. circle annotation.
+
+      :arg rect_like rect: the rectangle in which the circle or rectangle is drawn, must be finite and not empty. If the rectangle is not equal-sided, an ellipse is drawn.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation. It is drawn with line color red, no fill color and line width 1.
+
+   .. method:: addRedactAnnot(quad, text=None, fontname=None, fontsize=11, align=TEXT_ALIGN_LEFT, fill=(1, 1, 1), text_color=(0, 0, 0), cross_out=True)
+
+      PDF only: *(new in version 1.16.11)* Add a redaction annotation. A redaction annotation identifies content to be removed from the document. Adding such an annotation is the first of two steps. It makes visible what will be removed in the subsequent step, :meth:`Page.apply_redactions`.
+
+      :arg quad_like,rect_like quad: specifies the (rectangular) area to be removed which is always equal to the annotation rectangle. This may be a :data:`rect_like` or :data:`quad_like` object. If a quad is specified, then the envelopping rectangle is taken.
+
+      :arg str text: *(New in v1.16.12)* text to be placed in the rectangle after applying the redaction (and thus removing old content).
+
+      :arg str fontname: *(New in v1.16.12)* the font to use when *text* is given, otherwise ignored. The same rules apply as for :meth:`Page.insertTextbox` -- which is the method :meth:`Page.apply_redactions` internally invokes. The replacement text will be **vertically centered**, if this is one of the CJK or :ref:`Base-14-Fonts`.
+
+         .. note::
+            For an **existing** font of the page, use its reference name as *fontname* (*item[4]* of its entry in :meth:`Page.getFontList`). To use a new, non-builtin font, proceed as follows::
+
+               page.insertText(point,  # anywhere, but outside all redaction rectangles
+               "somthing",  # some non-empty string
+               fontname="newname",  # new, unused reference name
+               fontfile="...",  # desired font file
+               render_mode=3,  # makes the text invisible
+               )
+               page.addRedactAnnot(..., fontname="newname")
+
+      :arg float fontsize: *(New in v1.16.12)* the fontsize to use for the replacing text. If the text is too large to fit, several insertion attempts will be made, gradually reducing this value down to 4. If then the text will still not fit, no text insertion will take place at all.
+
+      :arg int align: *(New in v1.16.12)* the horizontal alignment for the replacing text. See :meth:`insertTextbox` for available values. The vertical alignment is (approximately) centered if a PDF built-in font is used (CJK or :ref:`Base-14-Fonts`).
+
+      :arg sequence fill: *(New in v1.16.12)* the fill color of the rectangle **after applying** the redaction. The default is *white = (1, 1, 1)*, which is also taken if *None* is specified. *(Changed in v1.16.13)* To suppress a fill color alltogether, specify *False*. In this cases the rectangle remains transparent.
+
+      :arg sequence text_color: *(New in v1.16.12)* the color of the replacing text. Default is *black = (0, 0, 0)*.
+
+      :arg bool cross_out: *(new in v1.17.2)* add two diagonal lines to the annotation rectangle.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation. *(Changed in v1.17.2)* Its standard appearance looks like a red rectangle (no fill color), optionally showing two diagonal lines. Colors, line width, dashing, opacity and blend mode can now be set and applied via :meth:`Annot.update` like with other annotations.
+
+      .. image:: images/img-redact.jpg
+
+   .. method:: addPolylineAnnot(points)
+
+   .. method:: addPolygonAnnot(points)
+
+      PDF only: Add an annotation consisting of lines which connect the given points. A **Polygon's** first and last points are automatically connected, which does not happen for a **PolyLine**. The **rectangle** is automatically created as the smallest rectangle containing the points, each one surrounded by a circle of radius 3 (= 3 * line width). The following shows a 'PolyLine' that has been modified with colors and line ends.
+
+      :arg list points: a list of :data:`point_like` objects.
+
+      :rtype: :ref:`Annot`
+      :returns: the created annotation. It is drawn with line color black, no fill color and line width 1. Use methods of :ref:`Annot` to make any changes to achieve something like this:
+
+      .. image:: images/img-polyline.png
+         :scale: 70
+
+   .. method:: addUnderlineAnnot(quads=None, start=None, stop=None, clip=None)
+
+   .. method:: addStrikeoutAnnot(quads=None, start=None, stop=None, clip=None)
+
+   .. method:: addSquigglyAnnot(quads=None, start=None, stop=None, clip=None)
+
+   .. method:: addHighlightAnnot(quads=None, start=None, stop=None, clip=None)
+
+      PDF only: These annotations are normally used for **marking text** which has previously been somehow located (for example via :meth:`Page.searchFor`). But this is not required: you are free to "mark" just anything.
+
+      Standard colors are chosen per annotation type: **yellow** for highlighting, **red** for strike out, **green** for underlining, and **magenta** for wavy underlining.
+
+      The methods convert the arguments into a list of :ref:`Quad` objects. The **annotation** rectangle is then calculated to envelop all these quadrilaterals.
+
+      .. note:: :meth:`searchFor` delivers a list of either rectangles or quadrilaterals. Such a list can be directly used as parameter for these annotation types and will deliver **one common** annotation for all occurrences of the search string::
+
+           >>> quads = page.searchFor("pymupdf", hit_max=100, quads=True)
+           >>> page.addHighlightAnnot(quads)
+
+      :arg rect_like,quad_like,list,tuple quads: *(Changed in v1.14.20)* the location(s) -- rectangle(s) or quad(s) -- to be marked. A list or tuple must consist of :data:`rect_like` or :data:`quad_like` items (or even a mixture of either). Every item must be finite, convex and not empty (as applicable). *(Changed in v1.16.14)* **Set this parameter to** *None* if you want to use the following arguments.
+      :arg point_like start: *(New in v1.16.14)* start text marking at this point. Defaults to the top-left point of *clip*.
+      :arg point_like stop: *(New in v1.16.14)* stop text marking at this point. Defaults to the bottom-right point of *clip*.
+      :arg rect_like clip: *(New in v1.16.14)* only consider text lines intersecting this area. Defaults to the page rectangle.
+
+      :rtype: :ref:`Annot` or *(changed in v1.16.14)* *None*
+      :returns: the created annotation. *(Changed in v1.16.14)* If *quads* is an empty list, **no annotation** is created. To change colors, set the "stroke" color accordingly (:meth:`Annot.setColors`) and then perform an :meth:`Annot.update`.
+
+      .. note:: Starting with v1.16.14 you can use parameters *start*, *stop* and *clip* to highlight consecutive lines between the points *start* and *stop*. Make use of *clip* to further reduce the selected line bboxes and thus deal with e.g. multi-column pages. The following multi-line highlight on a page with three text columnbs was created by specifying the two red points and setting clip accordingly.
+
+      .. image:: images/img-markers.jpg
+         :scale: 100
+
+   .. method:: addStampAnnot(rect, stamp=0)
+
+      PDF only: Add a "rubber stamp" like annotation to e.g. indicate the document's intended use ("DRAFT", "CONFIDENTIAL", etc.).
+
+      :arg rect_like rect: rectangle where to place the annotation.
+
+      :arg int stamp: id number of the stamp text. For available stamps see :ref:`StampIcons`.
+
+      .. note::
+
+         * The stamp's text and its border line will automatically be sized and be put horizontally and vertically centered in the given rectangle. :attr:`Annot.rect` is automatically calculated to fit the given **width** and will usually be smaller than this parameter.
+         * The font chosen is "Times Bold" and the text will be upper case.
+         * The appearance can be changed using :meth:`Annot.setOpacity` and by setting the "stroke" color (no "fill" color supported).
+         * This can be used to create watermark images: on a temporary PDF page create a stamp annotation with a low opacity value, make a pixmap from it with *alpha=True* (and potentially also rotate it), discard the temporary PDF page and use the pixmap with :meth:`insertImage` for your target PDF.
+
+
+      .. image :: images/img-stampannot.jpg
+         :scale: 80
+
+   .. method:: addWidget(widget)
+
+      PDF only: Add a PDF Form field ("widget") to a page. This also **turns the PDF into a Form PDF**. Because of the large amount of different options available for widgets, we have developed a new class :ref:`Widget`, which contains the possible PDF field attributes. It must be used for both, form field creation and updates.
+
+      :arg widget: a :ref:`Widget` object which must have been created upfront.
+      :type widget: :ref:`Widget`
+
+      :returns: a widget annotation.
+
+   .. method:: deleteAnnot(annot)
+
+      PDF only: Delete the specified annotation from the page and return the next one.
+
+      Changed in version 1.16.6 The removal will now include any bound 'Popup' or response annotations and related objects.
+
+      :arg annot: the annotation to be deleted.
+      :type annot: :ref:`Annot`
+
+      :rtype: :ref:`Annot`
+      :returns: the annotation following the deleted one. Please remember that physical removal will take place only with saving to a new file with a positive garbage collection option.
+
+   .. method:: apply_redactions()
+
+      PDF only: *(New in version 1.16.11)* Remove all **text content** contained in any redaction rectangle.
+
+      *(Changed in v1.16.12)* The previous *mark* parameter is gone. Instead, the respective rectangles are filled with the individual *fill* color of each redaction annotation. If a *text* was given in the annotation, then :meth:`insertTextbox` is invoked to insert it, using parameters provided with the redaction.
+
+      **This method applies and then deletes all redaction annotations from the page.**
+
+      :returns: *True* if at least one redaction annotation has been processed, *False* otherwise.
+
+      .. note::
+         Text contained in a redaction rectangle will be **physically** removed from the page and will no longer appear in e.g. text extractions or anywhere else. Other annotations are unaffected.
+
+         Images and links will also **physically** be removed from the page. For an image, overlapping parts will be blanked-out. Links will always be completely removed.
+
+         Text removal is done by character: A character is removed if its bbox has a **non-empty intersection** with a redaction *(changed in v1.17)*.
+
+         Redactions are an easy way to replace single words in a PDF, or to just physically remove them from the PDF: locate the word "secret" using some text extraction or search method and insert a redaction using "xxxxxx" as replacement text for each occurrence.
+
+            * Be wary if the replacement is longer than the original -- this may lead to an awkward appearance, line breaks or no new text at all.
+
+            * For a number of reasons, the new text may not exactly be positioned on the same line like the old one -- especially true if the replacement font was not one of CJK or :ref:`Base-14-Fonts`.
+
+   .. method:: deleteLink(linkdict)
+
+      PDF only: Delete the specified link from the page. The parameter must be an **original item** of :meth:`getLinks()` (see below). The reason for this is the dictionary's *"xref"* key, which identifies the PDF object to be deleted.
+
+      :arg dict linkdict: the link to be deleted.
+
+   .. method:: insertLink(linkdict)
+
+      PDF only: Insert a new link on this page. The parameter must be a dictionary of format as provided by :meth:`getLinks()` (see below).
+
+      :arg dict linkdict: the link to be inserted.
+
+   .. method:: updateLink(linkdict)
+
+      PDF only: Modify the specified link. The parameter must be a (modified) **original item** of :meth:`getLinks()` (see below). The reason for this is the dictionary's *"xref"* key, which identifies the PDF object to be changed.
+
+      :arg dict linkdict: the link to be modified.
+
+   .. method:: getLinks()
+
+      Retrieves **all** links of a page.
+
+      :rtype: list
+      :returns: A list of dictionaries. For a description of the dictionary entries see below. Always use this or the :meth:`Page.links` method if you intend to make changes to the links of a page.
+
+   .. method:: links(kinds=None)
+
+      *(New in version 1.16.4)*
+      
+      Return a generator over the page's links. The results equal the entries of :meth:`Page.getLinks`.
+
+      :arg sequence kinds: a sequence of integers to down-select to one or more link kinds. Default is all links. Example: *kinds=(fitz.LINK_GOTO,)* will only return internal links.
+
+      :rtype: generator
+      :returns: an entry of :meth:`Page.getLinks()` for each iteration.
+
+   .. method:: annots(types=None)
+
+      *(New in version 1.16.4)*
+      
+      Return a generator over the page's annotations.
+
+      :arg sequence types: a sequence of integers to down-select to one or annotation types. Default is all annotations. Example: *types=(fitz.PDF_ANNOT_FREETEXT, fitz.PDF_ANNOT_TEXT)* will only return 'FreeText' and 'Text' annotations.
+
+      :rtype: generator
+      :returns: an :ref:`Annot` for each iteration.
+
+   .. method:: widgets(types=None)
+
+      *(New in version 1.16.4)*
+      
+      Return a generator over the page's form fields.
+
+      :arg sequence types: a sequence of integers to down-select to one or more widget types. Default is all form fields. Example: *types=(fitz.PDF_WIDGET_TYPE_TEXT,)* will only return 'Text' fields.
+
+      :rtype: generator
+      :returns: a :ref:`Widget` for each iteration.
+
+
+   .. method:: writeText(rect=None, writers=None, overlay=True, color=None, opacity=None, keep_proportion=True, rotate=0)
+
+      *(New in version 1.16.18)*
+      
+      PDF only: Write the text of one or more :ref:`Textwriter` ojects to the page.
+
+      :arg rect_like rect: where to place the text. If omitted, the rectangle union of the text writers is used.
+      :arg sequence writers: a non-empty tuple / list of :ref:`TextWriter` objects or a single :ref:`TextWriter`.
+      :arg float opacity: set transparency, overwrites resp. value in the text writers.
+      :arg sequ color: set the text color, overwrites  resp. value in the text writers.
+      :arg bool overlay: put the text in foreground or background.
+      :arg bool keep_proportion: maintain the aspect ratio.
+      :arg float rotate: rotate the text by an arbitrary angle.
+
+      .. note:: Parameters overlay, keep_proportion and rotate have the same meaning as in :ref:`showPDFpage`.
+
+
+   .. index::
+      pair: border_width; insertText
+      pair: color; insertText
+      pair: encoding; insertText
+      pair: fill; insertText
+      pair: fontfile; insertText
+      pair: fontname; insertText
+      pair: fontsize; insertText
+      pair: morph; insertText
+      pair: overlay; insertText
+      pair: render_mode; insertText
+      pair: rotate; insertText
+
+   .. method:: insertText(point, text, fontsize=11, fontname="helv", fontfile=None, idx=0, color=None, fill=None, render_mode=0, border_width=1, encoding=TEXT_ENCODING_LATIN, rotate=0, morph=None, overlay=True)
+
+      PDF only: Insert text starting at :data:`point_like` *point*. See :meth:`Shape.insertText`.
+
+   .. index::
+      pair: align; insertTextbox
+      pair: border_width; insertTextbox
+      pair: color; insertTextbox
+      pair: encoding; insertTextbox
+      pair: expandtabs; insertTextbox
+      pair: fill; insertTextbox
+      pair: fontfile; insertTextbox
+      pair: fontname; insertTextbox
+      pair: fontsize; insertTextbox
+      pair: morph; insertTextbox
+      pair: overlay; insertTextbox
+      pair: render_mode; insertTextbox
+      pair: rotate; insertTextbox
+
+   .. method:: insertTextbox(rect, buffer, fontsize=11, fontname="helv", fontfile=None, idx=0, color=None, fill=None, render_mode=0, border_width=1, encoding=TEXT_ENCODING_LATIN, expandtabs=8, align=TEXT_ALIGN_LEFT, charwidths=None, rotate=0, morph=None, overlay=True)
+
+      PDF only: Insert text into the specified :data:`rect_like` *rect*. See :meth:`Shape.insertTextbox`.
+
+   .. index::
+      pair: closePath; drawLine
+      pair: color; drawLine
+      pair: dashes; drawLine
+      pair: fill; drawLine
+      pair: lineCap; drawLine
+      pair: lineJoin; drawLine
+      pair: lineJoin; drawLine
+      pair: morph; drawLine
+      pair: overlay; drawLine
+      pair: width; drawLine
+
+   .. method:: drawLine(p1, p2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None)
+
+      PDF only: Draw a line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.drawLine`.
+
+   .. index::
+      pair: breadth; drawZigzag
+      pair: closePath; drawZigzag
+      pair: color; drawZigzag
+      pair: dashes; drawZigzag
+      pair: fill; drawZigzag
+      pair: lineCap; drawZigzag
+      pair: lineJoin; drawZigzag
+      pair: morph; drawZigzag
+      pair: overlay; drawZigzag
+      pair: width; drawZigzag
+
+   .. method:: drawZigzag(p1, p2, breadth=2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None)
+
+      PDF only: Draw a zigzag line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.drawZigzag`.
+
+   .. index::
+      pair: breadth; drawSquiggle
+      pair: closePath; drawSquiggle
+      pair: color; drawSquiggle
+      pair: dashes; drawSquiggle
+      pair: fill; drawSquiggle
+      pair: lineCap; drawSquiggle
+      pair: lineJoin; drawSquiggle
+      pair: morph; drawSquiggle
+      pair: overlay; drawSquiggle
+      pair: width; drawSquiggle
+
+   .. method:: drawSquiggle(p1, p2, breadth=2, color=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None)
+
+      PDF only: Draw a squiggly (wavy, undulated) line from *p1* to *p2* (:data:`point_like` \s). See :meth:`Shape.drawSquiggle`.
+
+   .. index::
+      pair: closePath; drawCircle
+      pair: color; drawCircle
+      pair: dashes; drawCircle
+      pair: fill; drawCircle
+      pair: lineCap; drawCircle
+      pair: lineJoin; drawCircle
+      pair: morph; drawCircle
+      pair: overlay; drawCircle
+      pair: width; drawCircle
+
+   .. method:: drawCircle(center, radius, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None)
+
+      PDF only: Draw a circle around *center* (:data:`point_like`) with a radius of *radius*. See :meth:`Shape.drawCircle`.
+
+   .. index::
+      pair: closePath; drawOval
+      pair: color; drawOval
+      pair: dashes; drawOval
+      pair: fill; drawOval
+      pair: lineCap; drawOval
+      pair: lineJoin; drawOval
+      pair: morph; drawOval
+      pair: overlay; drawOval
+      pair: width; drawOval
+
+   .. method:: drawOval(quad, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None)
+
+      PDF only: Draw an oval (ellipse) within the given :data:`rect_like` or :data:`quad_like`. See :meth:`Shape.drawOval`.
+
+   .. index::
+      pair: closePath; drawSector
+      pair: color; drawSector
+      pair: dashes; drawSector
+      pair: fill; drawSector
+      pair: fullSector; drawSector
+      pair: lineCap; drawSector
+      pair: lineJoin; drawSector
+      pair: morph; drawSector
+      pair: overlay; drawSector
+      pair: width; drawSector
+
+   .. method:: drawSector(center, point, angle, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, fullSector=True, overlay=True, closePath=False, morph=None)
+
+      PDF only: Draw a circular sector, optionally connecting the arc to the circle's center (like a piece of pie). See :meth:`Shape.drawSector`.
+
+   .. index::
+      pair: closePath; drawPolyline
+      pair: color; drawPolyline
+      pair: dashes; drawPolyline
+      pair: fill; drawPolyline
+      pair: lineCap; drawPolyline
+      pair: lineJoin; drawPolyline
+      pair: morph; drawPolyline
+      pair: overlay; drawPolyline
+      pair: width; drawPolyline
+
+   .. method:: drawPolyline(points, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None)
+
+      PDF only: Draw several connected lines defined by a sequence of :data:`point_like` \s. See :meth:`Shape.drawPolyline`.
+
+
+   .. index::
+      pair: closePath; drawBezier
+      pair: color; drawBezier
+      pair: dashes; drawBezier
+      pair: fill; drawBezier
+      pair: lineCap; drawBezier
+      pair: lineJoin; drawBezier
+      pair: morph; drawBezier
+      pair: overlay; drawBezier
+      pair: width; drawBezier
+
+   .. method:: drawBezier(p1, p2, p3, p4, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None)
+
+      PDF only: Draw a cubic BÃ©zier curve from *p1* to *p4* with the control points *p2* and *p3* (all are :data`point_like` \s). See :meth:`Shape.drawBezier`.
+
+   .. index::
+      pair: closePath; drawCurve
+      pair: color; drawCurve
+      pair: dashes; drawCurve
+      pair: fill; drawCurve
+      pair: lineCap; drawCurve
+      pair: lineJoin; drawCurve
+      pair: morph; drawCurve
+      pair: overlay; drawCurve
+      pair: width; drawCurve
+
+   .. method:: drawCurve(p1, p2, p3, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, closePath=False, morph=None)
+
+      PDF only: This is a special case of *drawBezier()*. See :meth:`Shape.drawCurve`.
+
+   .. index::
+      pair: closePath; drawRect
+      pair: color; drawRect
+      pair: dashes; drawRect
+      pair: fill; drawRect
+      pair: lineCap; drawRect
+      pair: lineJoin; drawRect
+      pair: morph; drawRect
+      pair: overlay; drawRect
+      pair: width; drawRect
+
+   .. method:: drawRect(rect, color=None, fill=None, width=1, dashes=None, lineCap=0, lineJoin=0, overlay=True, morph=None)
+
+      PDF only: Draw a rectangle. See :meth:`Shape.drawRect`.
+
+      .. note:: An efficient way to background-color a PDF page with the old Python paper color is
+
+          >>> col = fitz.utils.getColor("py_color")
+          >>> page.drawRect(page.rect, color=col, fill=col, overlay=False)
+
+   .. index::
+      pair: encoding; insertFont
+      pair: fontbuffer; insertFont
+      pair: fontfile; insertFont
+      pair: fontname; insertFont
+      pair: set_simple; insertFont
+
+   .. method:: insertFont(fontname="helv", fontfile=None, fontbuffer=None, set_simple=False, encoding=TEXT_ENCODING_LATIN)
+
+      PDF only: Add a new font to be used by text output methods and return its :data:`xref`. If not already present in the file, the font definition will be added. Supported are the built-in :data:`Base14_Fonts` and the CJK fonts via **"reserved"** fontnames. Fonts can also be provided as a file path or a memory area containing the image of a font file.
+
+      :arg str fontname: The name by which this font shall be referenced when outputting text on this page. In general, you have a "free" choice here (but consult the :ref:`AdobeManual`, page 56, section 3.2.4 for a formal description of building legal PDF names). However, if it matches one of the :data:`Base14_Fonts` or one of the CJK fonts, *fontfile* and *fontbuffer* **are ignored**.
+
+      In other words, you cannot insert a font via *fontfile* / *fontbuffer* and also give it a reserved *fontname*.
+
+      .. note:: A reserved fontname can be specified in any mixture of upper or lower case and still match the right built-in font definition: fontnames "helv", "Helv", "HELV", "Helvetica", etc. all lead to the same font definition "Helvetica". But from a :ref:`Page` perspective, these are **different references**. You can exploit this fact when using different *encoding* variants (Latin, Greek, Cyrillic) of the same font on a page.
+
+      :arg str fontfile: a path to a font file. If used, *fontname* must be **different from all reserved names**.
+
+      :arg bytes/bytearray fontbuffer: the memory image of a font file. If used, *fontname* must be **different from all reserved names**. This parameter would typically be used to transfer fonts between different pages of the same or different PDFs.
+
+      :arg int set_simple: applicable for *fontfile* / *fontbuffer* cases only: enforce treatment as a "simple" font, i.e. one that only uses character codes up to 255.
+
+      :arg int encoding: applicable for the "Helvetica", "Courier" and "Times" sets of :data:`Base14_Fonts` only. Select one of the available encodings Latin (0), Cyrillic (2) or Greek (1). Only use the default (0 = Latin) for "Symbol" and "ZapfDingBats".
+
+      :rytpe: int
+      :returns: the :data:`xref` of the installed font.
+
+      .. note:: Built-in fonts will not lead to the inclusion of a font file. So the resulting PDF file will remain small. However, your PDF viewer software is responsible for generating an appropriate appearance -- and there **exist** differences on whether or how each one of them does this. This is especially true for the CJK fonts. But also Symbol and ZapfDingbats are incorrectly handled in some cases. Following are the **Font Names** and their correspondingly installed **Base Font** names:
+
+         **Base-14 Fonts** [#f1]_
+
+         ============= ============================ =========================================
+         **Font Name** **Installed Base Font**      **Comments**
+         ============= ============================ =========================================
+         helv          Helvetica                    normal
+         heit          Helvetica-Oblique            italic
+         hebo          Helvetica-Bold               bold
+         hebi          Helvetica-BoldOblique        bold-italic
+         cour          Courier                      normal
+         coit          Courier-Oblique              italic
+         cobo          Courier-Bold                 bold
+         cobi          Courier-BoldOblique          bold-italic
+         tiro          Times-Roman                  normal
+         tiit          Times-Italic                 italic
+         tibo          Times-Bold                   bold
+         tibi          Times-BoldItalic             bold-italic
+         symb          Symbol                       [#f3]_
+         zadb          ZapfDingbats                 [#f3]_
+         ============= ============================ =========================================
+
+         **CJK Fonts** [#f2]_ (China, Japan, Korea)
+
+         ============= ============================ =========================================
+         **Font Name** **Installed Base Font**      **Comments**
+         ============= ============================ =========================================
+         china-s       Heiti                        simplified Chinese
+         china-ss      Song                         simplified Chinese (serif)
+         china-t       Fangti                       traditional Chinese
+         china-ts      Ming                         traditional Chinese (serif)
+         japan         Gothic                       Japanese
+         japan-s       Mincho                       Japanese (serif)
+         korea         Dotum                        Korean
+         korea-s       Batang                       Korean (serif)
+         ============= ============================ =========================================
+
+   .. index::
+      pair: filename; insertImage
+      pair: keep_proportion; insertImage
+      pair: overlay; insertImage
+      pair: pixmap; insertImage
+      pair: rotate; insertImage
+      pair: stream; insertImage
+
+   .. method:: insertImage(rect, filename=None, pixmap=None, stream=None, rotate=0, keep_proportion=True, overlay=True)
+
+      PDF only: Put an image inside the given rectangle. The image can be taken from a pixmap, a file or a memory area - of these parameters **exactly one** must be specified.
+
+         Changed in version 1.14.11 By default, the image keeps its aspect ratio.
+
+      :arg rect_like rect: where to put the image on the page. Only the rectangle part which is inside the page is used. This intersection must be finite and not empty.
+
+         Changed in version 1.14.13 The image is now always placed **centered** in the rectangle, i.e. the center of the image and the rectangle coincide.
+
+      :arg str filename: name of an image file (all formats supported by MuPDF -- see :ref:`ImageFiles`). If the same image is to be inserted multiple times, choose one of the other two options to avoid some overhead.
+
+      :arg bytes,bytearray,io.BytesIO stream: image in memory (all formats supported by MuPDF -- see :ref:`ImageFiles`). This is the most efficient option.
+      
+         Changed in version 1.14.13 *io.BytesIO* is now also supported.
+
+      :arg pixmap: a pixmap containing the image.
+      :type pixmap: :ref:`Pixmap`
+
+      :arg int rotate: *(new in version v1.14.11)* rotate the image. Must be an integer multiple of 90 degrees. If you need a rotation by an arbitrary angle, consider converting the image to a PDF (:meth:`Document.convertToPDF`) first and then use :meth:`Page.showPDFpage` instead.
+
+      :arg bool keep_proportion: *(new in version v1.14.11)* maintain the aspect ratio of the image.
+
+      For a description of *overlay* see :ref:`CommonParms`.
+
+      This example puts the same image on every page of a document::
+
+         >>> doc = fitz.open(...)
+         >>> rect = fitz.Rect(0, 0, 50, 50)       # put thumbnail in upper left corner
+         >>> img = open("some.jpg", "rb").read()  # an image file
+         >>> for page in doc:
+               page.insertImage(rect, stream = img)
+         >>> doc.save(...)
+
+      .. note::
+
+         1. If that same image had already been present in the PDF, then only a reference to it will be inserted. This of course considerably saves disk space and processing time. But to detect this fact, existing PDF images need to be compared with the new one. This is achieved by storing an MD5 code for each image in a table and only compare the new image's MD5 code against the table entries. Generating this MD5 table, however, is done when the first image is inserted - which therefore may have an extended response time.
+
+         2. You can use this method to provide a background or foreground image for the page, like a copyright, a watermark. Please remember, that watermarks require a transparent image ...
+
+         3. The image may be inserted uncompressed, e.g. if a *Pixmap* is used or if the image has an alpha channel. Therefore, consider using *deflate=True* when saving the file.
+
+         4. The image is stored in the PDF in its original quality. This may be much better than you ever need for your display. In this case consider decreasing the image size before inserting it -- e.g. by using the pixmap option and then shrinking it or scaling it down (see :ref:`Pixmap` chapter). The PIL method *Image.thumbnail()* can also be used for that purpose. The file size savings can be very significant.
+
+         5. The most efficient way to display the same image on multiple pages is another method: :meth:`showPDFpage`. Consult :meth:`Document.convertToPDF` for how to obtain intermediary PDFs usable for that method. Demo script `fitz-logo.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/fitz-logo.py>`_ implements a fairly complete approach.
+
+   .. index::
+      pair: blocks; getText
+      pair: dict; getText
+      pair: flags; getText
+      pair: html; getText
+      pair: json; getText
+      pair: rawdict; getText
+      pair: text; getText
+      pair: words; getText
+      pair: xhtml; getText
+      pair: xml; getText
+
+   .. method:: getText(opt="text", flags=None)
+
+      Retrieves the content of a page in a variety of formats. This is a wrapper for :ref:`TextPage` methods by choosing the output option as follows:
+
+      * "text" -- :meth:`TextPage.extractTEXT`, default
+      * "blocks" -- :meth:`TextPage.extractBLOCKS`
+      * "words" -- :meth:`TextPage.extractWORDS`
+      * "html" -- :meth:`TextPage.extractHTML`
+      * "xhtml" -- :meth:`TextPage.extractXHTML`
+      * "xml" -- :meth:`TextPage.extractXML`
+      * "dict" -- :meth:`TextPage.extractDICT`
+      * "json" -- :meth:`TextPage.extractJSON`
+      * "rawdict" -- :meth:`TextPage.extractRAWDICT`
+
+      :arg str opt: A string indicating the requested format, one of the above. A mixture of upper and lower case is supported.
+
+         Changed in version 1.16.3 Values "words" and "blocks" are now also accepted.
+
+      :arg int flags: *(new in version 1.16.2)* indicator bits to control whether to include images or how text should be handled with respect to white spaces and ligatures. See :ref:`TextPreserve` for available indicators and :ref:`text_extraction_flags` for default settings.
+
+      :rtype: *str, list, dict*
+      :returns: The page's content as a string, list or as a dictionary. Refer to the corresponding :ref:`TextPage` method for details.
+
+      .. note:: You can use this method as a **document conversion tool** from any supported document type (not only PDF!) to one of TEXT, HTML, XHTML or XML documents.
+
+   .. index::
+      pair: flags; getTextPage
+
+   .. method:: getTextPage(flags=3)
+
+      *(New in version 1.16.5)*
+      
+      Create a :ref:`TextPage` for the page. This method avoids using an intermediate :ref:`DisplayList`.
+
+      :arg in flags: indicator bits controlling the content available for subsequent extraction -- see the parameter of :meth:`Page.getText`.
+
+      :returns: :ref:`TextPage`
+
+   .. method:: getFontList(full=False)
+
+      PDF only: Return a list of fonts referenced by the page. Wrapper for :meth:`Document.getPageFontList`.
+
+   .. method:: getImageList(full=False)
+
+      PDF only: Return a list of images referenced by the page. Wrapper for :meth:`Document.getPageImageList`.
+
+   .. method:: getImageBbox(item)
+
+      PDF only: Return the boundary box of an image.
+
+      *Changed in version 1.17.0:*
+
+      * The method should deliver correct results now.
+      * The page's ``/Contents`` are no longer modified by this method.
+      
+      :arg list,str item: an item of the list :meth:`Page.getImageList` with *full=True* specified, or the **name** entry of such an item, which is item[-3] (or item[7] respectively).
+
+      :rtype: :ref:`Rect`
+      :returns: the boundary box of the image.
+         *(Changed in version 1.16.7)* If the page in fact does not display this image, an infinite rectangle is returned now. In previous versions, an exception was raised.
+         *(Changed in version 1.17.0)* Only images referenced directly by the page are considered. This means that images occurring in embedded PDF pages are ignored and an exception is raised.
+
+      .. note::
+
+         * Be aware that :meth:`Page.getImageList` may contain "dead" entries, i.e. there may be image references which are **not displayed** by this page. In this case an infinite rectangle is returned.
+         * As mentioned above, images inside embedded PDF pages are ignored by this method.
+
+   .. index::
+      pair: matrix; getSVGimage
+
+   .. method:: getSVGimage(matrix=fitz.Identity)
+
+      Create an SVG image from the page. Only full page images are currently supported.
+
+     :arg matrix_like matrix: a matrix, default is :ref:`Identity`.
+
+     :returns: a UTF-8 encoded string that contains the image. Because SVG has XML syntax it can be saved in a text file with extension *.svg*.
+
+   .. index::
+      pair: alpha; getPixmap
+      pair: annots; getPixmap
+      pair: clip; getPixmap
+      pair: colorspace; getPixmap
+      pair: matrix; getPixmap
+
+   .. method:: getPixmap(matrix=fitz.Identity, colorspace=fitz.csRGB, clip=None, alpha=False, annots=True)
+
+     Create a pixmap from the page. This is probably the most often used method to create a :ref:`Pixmap`.
+
+     :arg matrix_like matrix: default is :ref:`Identity`.
+     :arg colorspace: Defines the required colorspace, one of "GRAY", "RGB" or "CMYK" (case insensitive). Or specify a :ref:`Colorspace`, ie. one of the predefined ones: :data:`csGRAY`, :data:`csRGB` or :data:`csCMYK`.
+     :type colorspace: str or :ref:`Colorspace`
+     :arg irect_like clip: restrict rendering to this area.
+     :arg bool alpha: whether to add an alpha channel. Always accept the default *False* if you do not really need transparency. This will save a lot of memory (25% in case of RGB ... and pixmaps are typically **large**!), and also processing time. Also note an **important difference** in how the image will be rendered: with *True* the pixmap's samples area will be pre-cleared with *0x00*. This results in **transparent** areas where the page is empty. With *False* the pixmap's samples will be pre-cleared with *0xff*. This results in **white** where the page has nothing to show.
+
+      Changed in version 1.14.17
+         The default alpha value is now *False*.
+
+         * Generated with *alpha=True*
+
+         .. image:: images/img-alpha-1.png
+
+
+         * Generated with *alpha=False*
+
+         .. image:: images/img-alpha-0.png
+
+     :arg bool annots: *(new in vrsion 1.16.0)* whether to also render annotations or to suppress them. You can create pixmaps for annotations separately.
+
+     :rtype: :ref:`Pixmap`
+     :returns: Pixmap of the page. For fine-controlling the generated image, the by far most important parameter is **matrix**. E.g. you can increase or decrease the image resolution by using **Matrix(xzoom, yzoom)**. If zoom > 1, you will get a higher resolution: zoom=2 will double the number of pixels in that direction and thus generate a 2 times larger image. Non-positive values will flip horizontally, resp. vertically. Similarly, matrices also let you rotate or shear, and you can combine effects via e.g. matrix multiplication. See the :ref:`Matrix` section to learn more.
+
+   .. method:: annot_names()
+
+      *(New in version 1.16.10)*
+
+      PDF only: return a list of the names of annotations, widgets and links. Technically, these are the */NM* values of every PDF object found in the page's */Annots*  array.
+
+      :rtype: list
+
+
+   .. method:: annot_xrefs()
+
+      *(New in version 1.17.1)*
+
+      PDF only: return a list of the :data`xref` numbers of annotations, widgets and links -- technically of all entries found in the page's */Annots*  array.
+
+      :rtype: list
+      :returns: a list of items *(xref, type)* where type is the annotation type. Use the type to tell apart links, fields and annotations, see :ref:`AnnotationTypes`.
+
+
+   .. method:: load_annot(ident)
+
+      *(Deprecated since v1.17.1)*.
+
+   .. method:: loadAnnot(ident)
+
+      *(New in version 1.17.1)*
+
+      PDF only: return the annotation identified by *ident*. This may be its unique name (PDF */NM* key), or its :data:`xref`.
+
+      :arg str,int ident: the annotation name or xref.
+
+      :rtype: :ref:`Annot`
+      :returns: the annotation or *None*.
+
+      .. note:: Methods :meth:`Page.annot_names`, :meth:`Page.annots_xrefs` provide lists of names or xrefs, respectively, from where an item may be picked and loaded via this method.
+
+   .. method:: loadLinks()
+
+      Return the first link on a page. Synonym of property :attr:`firstLink`.
+
+      :rtype: :ref:`Link`
+      :returns: first link on the page (or *None*).
+
+   .. index::
+      pair: rotate; setRotation
+
+   .. method:: setRotation(rotate)
+
+      PDF only: Sets the rotation of the page.
+
+      :arg int rotate: An integer specifying the required rotation in degrees. Must be an integer multiple of 90. Values will be converted to one of 0, 90, 180, 270.
+
+   .. index::
+      pair: clip; showPDFpage
+      pair: keep_proportion; showPDFpage
+      pair: overlay; showPDFpage
+      pair: rotate; showPDFpage
+
+   .. method:: showPDFpage(rect, docsrc, pno=0, keep_proportion=True, overlay=True, rotate=0, clip=None)
+
+      PDF only: Display a page of another PDF as a **vector image** (otherwise similar to :meth:`Page.insertImage`). This is a multi-purpose method. For example, you can use it to
+
+      * create "n-up" versions of existing PDF files, combining several input pages into **one output page** (see example `4-up.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/4-up.py>`_),
+      * create "posterized" PDF files, i.e. every input page is split up in parts which each create a separate output page (see `posterize.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/posterize.py>`_),
+      * include PDF-based vector images like company logos, watermarks, etc., see `svg-logo.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/svg-logo.py>`_, which puts an SVG-based logo on each page (requires additional packages to deal with SVG-to-PDF conversions).
+
+      Changed in version 1.14.11
+         Parameter *reuse_xref* has been deprecated.
+
+      :arg rect_like rect: where to place the image on current page. Must be finite and its intersection with the page must not be empty.
+
+          Changed in version 1.14.11
+             Position the source rectangle centered in this rectangle.
+
+      :arg docsrc: source PDF document containing the page. Must be a different document object, but may be the same file.
+      :type docsrc: :ref:`Document`
+
+      :arg int pno: page number (0-based, in *-inf < pno < docsrc.pageCount*) to be shown.
+
+      :arg bool keep_proportion: whether to maintain the width-height-ratio (default). If false, all 4 corners are always positioned on the border of the target rectangle -- whatever the rotation value. In general, this will deliver distorted and /or non-rectangular images.
+
+      :arg bool overlay: put image in foreground (default) or background.
+
+      :arg float rotate: *(new in version 1.14.10)* show the source rectangle rotated by some angle. *Changed in version 1.14.11:* Any angle is now supported.
+
+      :arg rect_like clip: choose which part of the source page to show. Default is the full page, else must be finite and its intersection with the source page must not be empty.
+
+      .. note:: In contrast to method :meth:`Document.insertPDF`, this method does not copy annotations or links, so they are not shown. But all its **other resources (text, images, fonts, etc.)** will be imported into the current PDF. They will therefore appear in text extractions and in :meth:`getFontList` and :meth:`getImageList` lists -- even if they are not contained in the visible area given by *clip*.
+
+      Example: Show the same source page, rotated by 90 and by -90 degrees:
+
+      >>> doc = fitz.open()  # new empty PDF
+      >>> page=doc.newPage()  # new page in A4 format
+      >>>
+      >>> # upper half page
+      >>> r1 = fitz.Rect(0, 0, page.rect.width, page.rect.height/2)
+      >>>
+      >>> # lower half page
+      >>> r2 = r1 + (0, page.rect.height/2, 0, page.rect.height/2)
+      >>>
+      >>> src = fitz.open("PyMuPDF.pdf")  # show page 0 of this
+      >>>
+      >>> page.showPDFpage(r1, src, 0, rotate=90)
+      >>> page.showPDFpage(r2, src, 0, rotate=-90)
+      >>> doc.save("show.pdf")
+
+      .. image:: images/img-showpdfpage.jpg
+         :scale: 70
+
+   .. method:: newShape()
+
+      PDF only: Create a new :ref:`Shape` object for the page.
+
+      :rtype: :ref:`Shape`
+      :returns: a new :ref:`Shape` to use for compound drawings. See description there.
+
+
+   .. index::
+      pair: flags; searchFor
+      pair: hit_max; searchFor
+      pair: quads; searchFor
+
+   .. method:: searchFor(text, hit_max=16, quads=False, flags=None)
+
+      Searches for *text* on a page. Wrapper for :meth:`TextPage.search`.
+
+      :arg str text: Text to search for. Upper / lower case is ignored. The string may contain spaces.
+
+      :arg int hit_max: Maximum number of occurrences accepted.
+      :arg bool quads: Return :ref:`Quad` instead of :ref:`Rect` objects.
+      :arg int flags: Control the data extracted by the underlying :ref:`TextPage`. Default is 0 (ligatures are dissolved, white space is replaced with space and excessive spaces are not suppressed).
+
+      :rtype: list
+
+      :returns: A list of :ref:`Rect` \s (resp. :ref:`Quad` \s) each of which  -- **normally!** -- surrounds one occurrence of *text*. **However:** if the search string spreads across more than one line, then a separate item is recorded in the list for each part of the string per line. So, if you are looking for "search string" and the two words happen to be located on separate lines, two entries will be recorded in the list: one for "search" and one for "string".
+
+        .. note:: In this way, the effect supports multi-line text marker annotations.
+
+
+   .. method:: setMediaBox(r)
+
+      PDF only: *(New in v1.16.13)* Change the physical page dimension by setting :data:`MediaBox` in the page's object definition.
+
+      :arg rect-like r: the new :data:`MediaBox` value.
+
+      .. note:: This method also sets the page's :data:`CropBox` to the same value -- to prevent mismatches caused by values further up in the parent hierarchy.
+
+      .. caution:: For existing pages this may have unexpected effects, if painting commands depend on a certain setting, and may lead to an empty or distorted appearance.
+
+
+   .. method:: setCropBox(r)
+
+      PDF only: change the visible part of the page.
+
+      :arg rect_like r: the new visible area of the page. Note that this **must** be specified in **unrotated coordinates**.
+
+      After execution if the page is not rotated, :attr:`Page.rect` will equal this rectangle, shifted to the top-left position (0, 0). Example session:
+
+      >>> page = doc.newPage()
+      >>> page.rect
+      fitz.Rect(0.0, 0.0, 595.0, 842.0)
+      >>>
+      >>> page.CropBox                   # CropBox and MediaBox still equal
+      fitz.Rect(0.0, 0.0, 595.0, 842.0)
+      >>>
+      >>> # now set CropBox to a part of the page
+      >>> page.setCropBox(fitz.Rect(100, 100, 400, 400))
+      >>> # this will also change the "rect" property:
+      >>> page.rect
+      fitz.Rect(0.0, 0.0, 300.0, 300.0)
+      >>>
+      >>> # but MediaBox remains unaffected
+      >>> page.MediaBox
+      fitz.Rect(0.0, 0.0, 595.0, 842.0)
+      >>>
+      >>> # revert everything we did
+      >>> page.setCropBox(page.MediaBox)
+      >>> page.rect
+      fitz.Rect(0.0, 0.0, 595.0, 842.0)
+
+   .. attribute:: rotation
+
+      Contains the rotation of the page in degrees (always 0 for non-PDF types).
+
+      :type: int
+
+   .. attribute:: CropBoxPosition
+
+      Contains the top-left point of the page's */CropBox* for a PDF, otherwise *Point(0, 0)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: CropBox
+
+      The page's */CropBox* for a PDF. Always the **unrotated** page rectangle is returned. For a non-PDF this will always equal the page rectangle.
+
+      :type: :ref:`Rect`
+
+   .. attribute:: MediaBoxSize
+
+      Contains the width and height of the page's :attr:`Page.MediaBox` for a PDF, otherwise the bottom-right coordinates of :attr:`Page.rect`.
+
+      :type: :ref:`Point`
+
+   .. attribute:: MediaBox
+
+      The page's :data:`MediaBox` for a PDF, otherwise :attr:`Page.rect`.
+
+      :type: :ref:`Rect`
+
+      .. note:: For most PDF documents and for **all other document types**, *page.rect == page.CropBox == page.MediaBox* is true. However, for some PDFs the visible page is a true subset of :data:`MediaBox`. Also, if the page is rotated, its ``Page.rect`` may not equal ``Page.CropBox``. In these cases the above attributes help to correctly locate page elements.
+
+   .. attribute:: transformationMatrix
+
+      This matrix translates coordinates from the PDF space to the MuPDF space. For example, in PDF ``/Rect [x0 y0 x1 y1]`` the pair (x0, y0) specifies the **bottom-left** point of the rectangle -- in contrast to MuPDF's system, where (x0, y0) specify top-left. Multiplying the PDF coordinates with this matrix will deliver the (Py-) MuPDF rectangle version. Obviously, the inverse matrix will again yield the PDF rectangle.
+
+      :type: :ref:`Matrix`
+
+   .. attribute:: rotationMatrix
+
+   .. attribute:: derotationMatrix
+
+      These matrices may be used for dealing with rotated PDF pages. When adding / inserting anything to a PDF page with PyMuPDF, the coordinates of the **unrotated** page are always used. These matrices help translating between the two states. Example: if a page is rotated by 90 degrees -- what would then be the coordinates of the top-left Point(0, 0) of an A4 page?
+
+         >>> page.setRotation(90)  # rotate an ISO A4 page
+         >>> page.rect
+         Rect(0.0, 0.0, 842.0, 595.0)
+         >>> p = fitz.Point(0, 0)  # where did top-left point land?
+         >>> p * page.rotationMatrix
+         Point(842.0, 0.0)
+         >>> 
+
+      :type: :ref:`Matrix`
+
+   .. attribute:: firstLink
+
+      Contains the first :ref:`Link` of a page (or *None*).
+
+      :type: :ref:`Link`
+
+   .. attribute:: firstAnnot
+
+      Contains the first :ref:`Annot` of a page (or *None*).
+
+      :type: :ref:`Annot`
+
+   .. attribute:: firstWidget
+
+      Contains the first :ref:`Widget` of a page (or *None*).
+
+      :type: :ref:`Widget`
+
+   .. attribute:: number
+
+      The page number.
+
+      :type: int
+
+   .. attribute:: parent
+
+      The owning document object.
+
+      :type: :ref:`Document`
+
+
+   .. attribute:: rect
+
+      Contains the rectangle of the page. Same as result of :meth:`Page.bound()`.
+
+      :type: :ref:`Rect`
+
+   .. attribute:: xref
+
+      The page's PDF :data:`xref`. Zero if not a PDF.
+
+      :type: :ref:`Rect`
+
+-----
+
+Description of *getLinks()* Entries
+----------------------------------------
+Each entry of the *getLinks()* list is a dictionay with the following keys:
+
+* *kind*:  (required) an integer indicating the kind of link. This is one of *LINK_NONE*, *LINK_GOTO*, *LINK_GOTOR*, *LINK_LAUNCH*, or *LINK_URI*. For values and meaning of these names refer to :ref:`linkDest Kinds`.
+
+* *from*:  (required) a :ref:`Rect` describing the "hot spot" location on the page's visible representation (where the cursor changes to a hand image, usually).
+
+* *page*:  a 0-based integer indicating the destination page. Required for *LINK_GOTO* and *LINK_GOTOR*, else ignored.
+
+* *to*:   either a *fitz.Point*, specifying the destination location on the provided page, default is *fitz.Point(0, 0)*, or a symbolic (indirect) name. If an indirect name is specified, *page = -1* is required and the name must be defined in the PDF in order for this to work. Required for *LINK_GOTO* and *LINK_GOTOR*, else ignored.
+
+* *file*: a string specifying the destination file. Required for *LINK_GOTOR* and *LINK_LAUNCH*, else ignored.
+
+* *uri*:  a string specifying the destination internet resource. Required for *LINK_URI*, else ignored.
+
+* *xref*: an integer specifying the PDF :data:`xref` of the link object. Do not change this entry in any way. Required for link deletion and update, otherwise ignored. For non-PDF documents, this entry contains *-1*. It is also *-1* for **all** entries in the *getLinks()* list, if **any** of the links is not supported by MuPDF - see the note below.
+
+Notes on Supporting Links
+---------------------------
+MuPDF's support for links has changed in **v1.10a**. These changes affect link types :data:`LINK_GOTO` and :data:`LINK_GOTOR`.
+
+Reading (pertains to method *getLinks()* and the *firstLink* property chain)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If MuPDF detects a link to another file, it will supply either a *LINK_GOTOR* or a *LINK_LAUNCH* link kind. In case of *LINK_GOTOR* destination details may either be given as page number (eventually including position information), or as an indirect destination.
+
+If an indirect destination is given, then this is indicated by *page = -1*, and *link.dest.dest* will contain this name. The dictionaries in the *getLinks()* list will contain this information as the *to* value.
+
+**Internal links are always** of kind *LINK_GOTO*. If an internal link specifies an indirect destination, it **will always be resolved** and the resulting direct destination will be returned. Names are **never returned for internal links**, and undefined destinations will cause the link to be ignored.
+
+Writing
+~~~~~~~~~
+
+PyMuPDF writes (updates, inserts) links by constructing and writing the appropriate PDF object **source**. This makes it possible to specify indirect destinations for *LINK_GOTOR* **and** *LINK_GOTO* link kinds (pre *PDF 1.2* file formats are **not supported**).
+
+.. warning:: If a *LINK_GOTO* indirect destination specifies an undefined name, this link can later on not be found / read again with MuPDF / PyMuPDF. Other readers however **will** detect it, but flag it as erroneous.
+
+Indirect *LINK_GOTOR* destinations can in general of course not be checked for validity and are therefore **always accepted**.
+
+Homologous Methods of :ref:`Document` and :ref:`Page`
+--------------------------------------------------------
+This is an overview of homologous methods on the :ref:`Document` and on the :ref:`Page` level.
+
+====================================== =====================================
+**Document Level**                     **Page Level**
+====================================== =====================================
+*Document.getPageFontlist(pno)*        :meth:`Page.getFontList`
+*Document.getPageImageList(pno)*       :meth:`Page.getImageList`
+*Document.getPagePixmap(pno, ...)*     :meth:`Page.getPixmap`
+*Document.getPageText(pno, ...)*       :meth:`Page.getText`
+*Document.searchPageFor(pno, ...)*     :meth:`Page.searchFor`
+====================================== =====================================
+
+The page number "pno"` is a 0-based integer *-inf < pno < pageCount*.
+
+.. note::
+
+   Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].<page method>*. So they **load and discard the page** on each execution.
+
+   However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.getFontList` is a wrapper the other way round and defined as follows: *page.getFontList == page.parent.getPageFontList(page.number)*.
+
+.. rubric:: Footnotes
+
+.. [#f1] If your existing code already uses the installed base name as a font reference (as it was supported by PyMuPDF versions earlier than 1.14), this will continue to work.
+
+.. [#f2] Not all PDF reader software (including internet browsers and office software) display all of these fonts. And if they do, the difference between the **serifed** and the **non-serifed** version may hardly be noticable. But serifed and non-serifed versions lead to different installed base fonts, thus providing an option to be displayable with your specific PDF viewer.
+
+.. [#f3] Not all PDF readers display these fonts at all. Some others do, but use a wrong character spacing, etc.
+
+.. [#f4] You are generally free to choose any of the :ref:`mupdficons` you consider adequate.
diff --git a/docs/pixmap.rst b/docs/pixmap.rst

new file mode 100644 (file)

index 0000000..244a33e
--- /dev/null
+++ b/docs/pixmap.rst
@@ -0,0 +1,453 @@
+.. _Pixmap:
+
+================
+Pixmap
+================
+
+Pixmaps ("pixel maps") are objects at the heart of MuPDF's rendering capabilities. They represent plane rectangular sets of pixels. Each pixel is described by a number of bytes ("components") defining its color, plus an optional alpha byte defining its transparency.
+
+In PyMuPDF, there exist several ways to create a pixmap. Except the first one, all of them are available as overloaded constructors. A pixmap can be created ...
+
+1. from a document page (method :meth:`Page.getPixmap`)
+2. empty, based on :ref:`Colorspace` and :ref:`IRect` information
+3. from a file
+4. from an in-memory image
+5. from a memory area of plain pixels
+6. from an image inside a PDF document
+7. as a copy of another pixmap
+
+.. note:: A number of image formats is supported as input for points 3. and 4. above. See section :ref:`ImageFiles`.
+
+Have a look at the :ref:`FAQ` section to see some pixmap usage "at work".
+
+============================= ===================================================
+**Method / Attribute**        **Short Description**
+============================= ===================================================
+:meth:`Pixmap.clearWith`      clear parts of a pixmap
+:meth:`Pixmap.copyPixmap`     copy parts of another pixmap
+:meth:`Pixmap.gammaWith`      apply a gamma factor to the pixmap
+:meth:`Pixmap.getImageData`   return a memory area in a variety of formats
+:meth:`Pixmap.getPNGData`     return a PNG as a memory area
+:meth:`Pixmap.invertIRect`    invert the pixels of a given area
+:meth:`Pixmap.pillowWrite`    save as image using pillow (experimental)
+:meth:`Pixmap.pillowData`     write image stream using pillow (experimental)
+:meth:`Pixmap.pixel`          return the value of a pixel
+:meth:`Pixmap.setAlpha`       set alpha values
+:meth:`Pixmap.setPixel`       set the color of a pixel
+:meth:`Pixmap.setRect`        set the color of a rectangle
+:meth:`Pixmap.setResolution`  set the image resolution
+:meth:`Pixmap.shrink`         reduce size keeping proportions
+:meth:`Pixmap.tintWith`       tint a pixmap with a color
+:meth:`Pixmap.writeImage`     save a pixmap in a variety of formats
+:meth:`Pixmap.writePNG`       save a pixmap as a PNG file
+:attr:`Pixmap.alpha`          transparency indicator
+:attr:`Pixmap.colorspace`     pixmap's :ref:`Colorspace`
+:attr:`Pixmap.height`         pixmap height
+:attr:`Pixmap.interpolate`    interpolation method indicator
+:attr:`Pixmap.irect`          :ref:`IRect` of the pixmap
+:attr:`Pixmap.n`              bytes per pixel
+:attr:`Pixmap.samples`        pixel area
+:attr:`Pixmap.size`           pixmap's total length
+:attr:`Pixmap.stride`         size of one image row
+:attr:`Pixmap.width`          pixmap width
+:attr:`Pixmap.x`              X-coordinate of top-left corner
+:attr:`Pixmap.xres`           resolution in X-direction
+:attr:`Pixmap.y`              Y-coordinate of top-left corner
+:attr:`Pixmap.yres`           resolution in Y-direction
+============================= ===================================================
+
+**Class API**
+
+.. class:: Pixmap
+
+   .. method:: __init__(self, colorspace, irect, alpha)
+
+      **New empty pixmap:** Create an empty pixmap of size and origin given by the rectangle. So, *irect.top_left* designates the top left corner of the pixmap, and its width and height are *irect.width* resp. *irect.height*. Note that the image area is **not initialized** and will contain crap data -- use eg. :meth:`clearWith` or :meth:`setRect` to be sure.
+
+      :arg colorspace: colorspace.
+      :type colorspace: :ref:`Colorspace`
+
+      :arg irect_like irect: Tte pixmap's position and dimension.
+
+      :arg bool alpha: Specifies whether transparency bytes should be included. Default is *False*.
+
+   .. method:: __init__(self, colorspace, source)
+
+      **Copy and set colorspace:** Copy *source* pixmap converting colorspace. Any colorspace combination is possible, but source colorspace must not be *None*.
+
+      :arg colorspace: desired **target** colorspace. This **may also be** *None*. In this case, a "masking" pixmap is created: its :attr:`Pixmap.samples` will consist of the source's alpha bytes only.
+      :type colorspace: :ref:`Colorspace`
+
+      :arg source: the source pixmap.
+      :type source: *Pixmap*
+
+   .. method:: __init__(self, source, width, height, [clip])
+
+      **Copy and scale:** Copy *source* pixmap choosing new width and height values. Supports partial copying and the source colorspace may be also *None*.
+
+      :arg source: the source pixmap.
+      :type source: *Pixmap*
+
+      :arg float width: desired target width.
+
+      :arg float height: desired target height.
+
+      :arg irect_like clip: a region of the source pixmap to take the copy from.
+
+      .. note:: If width or height are not *de facto* integers (meaning e.g. *float(int(width) != width*), then pixmap will be created with *alpha = 1*.
+
+   .. method:: __init__(self, source, alpha=1)
+
+      **Copy and add or drop alpha:** Copy *source* and add or drop its alpha channel. Identical copy if *alpha* equals *source.alpha*. If an alpha channel is added, its values will be set to 255.
+
+      :arg source: source pixmap.
+      :type source: *Pixmap*
+
+      :arg bool alpha: whether the target will have an alpha channel, default and mandatory if source colorspace is *None*.
+
+      .. note:: A typical use includes separation of color and transparency bytes in separate pixmaps. Some applications require this like e.g. *wx.Bitmap.FromBufferAndAlpha()* of *wxPython*:
+
+         >>> # 'pix' is an RGBA pixmap
+         >>> pixcolors = fitz.Pixmap(pix, 0)    # extract the RGB part (drop alpha)
+         >>> pixalpha = fitz.Pixmap(None, pix)  # extract the alpha part
+         >>> bm = wx.Bitmap.FromBufferAndAlpha(pix.widht, pix.height, pixcolors.samples, pixalpha.samples)
+
+
+   .. method:: __init__(self, filename)
+
+      **From a file:** Create a pixmap from *filename*. All properties are inferred from the input. The origin of the resulting pixmap is *(0, 0)*.
+
+      :arg str filename: Path of the image file.
+
+   .. method:: __init__(self, stream)
+
+      **From memory:** Create a pixmap from a memory area. All properties are inferred from the input. The origin of the resulting pixmap is *(0, 0)*.
+
+      :arg bytes,bytearray,BytesIO stream: Data containing a complete, valid image. Could have been created by e.g. *stream = bytearray(open('image.file', 'rb').read())*. Type *bytes* is supported in **Python 3 only**, because *bytes == str* in Python 2 and the method will interpret the stream as a filename.
+
+         *Changed in version 1.14.13:* *io.BytesIO* is now also supported.
+
+
+   .. method:: __init__(self, colorspace, width, height, samples, alpha)
+
+      **From plain pixels:** Create a pixmap from *samples*. Each pixel must be represented by a number of bytes as controlled by the *colorspace* and *alpha* parameters. The origin of the resulting pixmap is *(0, 0)*. This method is useful when raw image data are provided by some other program -- see :ref:`FAQ`.
+
+      :arg colorspace: Colorspace of image.
+      :type colorspace: :ref:`Colorspace`
+
+      :arg int width: image width
+
+      :arg int height: image height
+
+      :arg bytes,bytearray,BytesIO samples:  an area containing all pixels of the image. Must include alpha values if specified.
+
+         *Changed in version 1.14.13:* (1) *io.BytesIO* can now also be used. (2) Data are now **copied** to the pixmap, so may safely be deleted or become unavailable.
+
+      :arg bool alpha: whether a transparency channel is included.
+
+      .. note::
+
+         1. The following equation **must be true**: *(colorspace.n + alpha) * width * height == len(samples)*.
+         2. Starting with version 1.14.13, the samples data are **copied** to the pixmap.
+
+
+   .. method:: __init__(self, doc, xref)
+
+      **From a PDF image:** Create a pixmap from an image **contained in PDF** *doc* identified by its :data:`xref`. All pimap properties are set by the image. Have a look at `extract-img1.py <https://github.com/pymupdf/PyMuPDF/tree/master/demo/extract-img1.py>`_ and `extract-img2.py <https://github.com/pymupdf/PyMuPDF/tree/master/demo/extract-img2.py>`_ to see how this can be used to recover all of a PDF's images.
+
+      :arg doc: an opened **PDF** document.
+      :type doc: :ref:`Document`
+
+      :arg int xref: the :data:`xref` of an image object. For example, you can make a list of images used on a particular page with :meth:`Document.getPageImageList`, which also shows the :data:`xref` numbers of each image.
+
+   .. method:: clearWith([value [, irect]])
+
+      Initialize the samples area.
+
+      :arg int value: if specified, values from 0 to 255 are valid. Each color byte of each pixel will be set to this value, while alpha will be set to 255 (non-transparent) if present. If omitted, then all bytes (including any alpha) are cleared to *0x00*.
+
+      :arg irect_like irect: the area to be cleared. Omit to clear the whole pixmap. Can only be specified, if *value* is also specified.
+
+   .. method:: tintWith(red, green, blue)
+
+      Colorize (tint) a pixmap with a color provided as an integer triple (red, green, blue). Only colorspaces :data:`CS_GRAY` and :data:`CS_RGB` are supported, others are ignored with a warning.
+
+      If the colorspace is :data:`CS_GRAY`, *(red + green + blue)/3* will be taken as the tint value.
+
+      :arg int red: *red* component.
+
+      :arg int green: *green* component.
+
+      :arg int blue: *blue* component.
+
+   .. method:: gammaWith(gamma)
+
+      Apply a gamma factor to a pixmap, i.e. lighten or darken it. Pixmaps with colorspace *None* are ignored with a warning.
+
+      :arg float gamma: *gamma = 1.0* does nothing, *gamma < 1.0* lightens, *gamma > 1.0* darkens the image.
+
+   .. method:: shrink(n)
+
+      Shrink the pixmap by dividing both, its width and height by 2\ :sup:`n`.
+
+      :arg int n: determines the new pixmap (samples) size. For example, a value of 2 divides width and height by 4 and thus results in a size of one 16\ :sup:`th` of the original. Values less than 1 are ignored with a warning.
+
+      .. note:: Use this methods to reduce a pixmap's size retaining its proportion. The pixmap is changed "in place". If you want to keep original and also have more granular choices, use the resp. copy constructor above.
+
+   .. method:: pixel(x, y)
+
+      *New in version:: 1.14.5:* Return the value of the pixel at location (x, y) (column, line).
+
+      :arg int x: the column number of the pixel. Must be in *range(pix.width)*.
+      :arg int y: the line number of the pixel, Must be in *range(pix.height)*.
+
+      :rtype: list
+      :returns: a list of color values and, potentially the alpha value. Its length and content depend on the pixmap's colorspace and the presence of an alpha. For RGBA pixmaps the result would e.g. be *[r, g, b, a]*. All items are integers in *range(256)*.
+
+   .. method:: setPixel(x, y, color)
+
+      *New in version 1.14.7:* Set the color of the pixel at location (x, y) (column, line).
+
+      :arg int x: the column number of the pixel. Must be in *range(pix.width)*.
+      :arg int y: the line number of the pixel. Must be in *range(pix.height)*.
+      :arg sequence color: the desired color given as a sequence of integers in *range(256)*. The length of the sequence must equal :attr:`Pixmap.n`, which includes any alpha byte.
+
+   .. method:: setRect(irect, color)
+
+      *New in version 1.14.8:* Set the pixels of a rectangle to a color.
+
+      :arg irect_like irect: the rectangle to be filled with the color. The actual area is the intersection of this parameter and :attr:`Pixmap.irect`. For an empty intersection (or an invalid parameter), no change will happen.
+      :arg sequence color: the desired color given as a sequence of integers in *range(256)*. The length of the sequence must equal :attr:`Pixmap.n`, which includes any alpha byte.
+
+      :rtype: bool
+      :returns: *False* if the rectangle was invalid or had an empty intersection with :attr:`Pixmap.irect`, else *True*.
+
+      .. note::
+
+         1. This method is equivalent to :meth:`Pixmap.setPixel` executed for each pixel in the rectangle, but is obviously **very much faster** if many pixels are involved.
+         2. This method can be used similar to :meth:`Pixmap.clearWith` to initialize a pixmap with a certain color like this: *pix.setRect(pix.irect, (255, 255, 0))* (RGB example, colors the complete pixmap with yellow).
+
+   .. method:: setResolution(xres, yres)
+
+      *(New in v1.16.17)* Set the resolution (dpi) in x and y direction.
+
+      :arg int xres: resolution in x direction.
+      :arg int yres: resolution in y direction.
+
+      .. note:: This is just documentary information. In MuPDF, this will not have other implications and will not be written to images created from the pixmap.
+
+
+   .. method:: setAlpha([alphavalues])
+
+      Change the alpha values. The pixmap must have an alpha channel.
+
+      :arg bytes,bytearray,BytesIO alphavalues: the new alpha values. If provided, its length must be at least *width * height*. If omitted, all alpha values are set to 255 (no transparency).
+
+         *Changed in version 1.14.13:* *io.BytesIO* is now also supported.
+
+
+   .. method:: invertIRect([irect])
+
+      Invert the color of all pixels in :ref:`IRect` *irect*. Will have no effect if colorspace is *None*.
+
+      :arg irect_like irect: The area to be inverted. Omit to invert everything.
+
+   .. method:: copyPixmap(source, irect)
+
+      Copy the *irect* part of the *source* pixmap into the corresponding area of this one. The two pixmaps may have different dimensions and can each have :data:`CS_GRAY` or :data:`CS_RGB` colorspaces, but they currently **must** have the same alpha property [#f2]_. The copy mechanism automatically adjusts discrepancies between source and target like so:
+
+      If copying from :data:`CS_GRAY` to :data:`CS_RGB`, the source gray-shade value will be put into each of the three rgb component bytes. If the other way round, *(r + g + b) / 3* will be taken as the gray-shade value of the target.
+
+      Between *irect* and the target pixmap's rectangle, an "intersection" is calculated at first. This takes into account the rectangle coordinates and the current attribute values *source.x* and *source.y* (which you are free to modify for this purpose). Then the corresponding data of this intersection are copied. If the intersection is empty, nothing will happen.
+
+      :arg source: source pixmap.
+      :type source: :ref:`Pixmap`
+
+      :arg irect_like irect: The area to be copied.
+
+   .. method:: writeImage(filename, output=None)
+
+      Save pixmap as an image file. Depending on the output chosen, only some or all colorspaces are supported and different file extensions can be chosen. Please see the table below. Since MuPDF v1.10a the *savealpha* option is no longer supported and will be silently ignored.
+
+      :arg str filename: The filename to save to. The filename's extension determines the image format, if not overriden by the output parameter.
+
+      :arg str output: The requested image format. The default is the filename's extension. If not recognized, *png* is assumed. For other possible values see :ref:`PixmapOutput`.
+
+   .. method:: writePNG(filename)
+
+      Equal to *pix.writeImage(filename, "png")*.
+
+   .. method:: getImageData(output="png")
+
+      *New in version 1.14.5:* Return the pixmap as a *bytes* memory object of the specified format -- similar to :meth:`writeImage`.
+
+      :arg str output: The requested image format. The default is "png" for which this function equals :meth:`getPNGData`. For other possible values see :ref:`PixmapOutput`.
+
+      :rtype: bytes
+
+   .. method:: getPNGdata()
+
+   .. method:: getPNGData()
+
+      Equal to *pix.getImageData("png")*.
+
+      :rtype: bytes
+
+   ..  method:: pillowWrite(*args, **kwargs)
+
+      *(New in v1.17.3)*
+
+      Write the pixmap as an image file using Pillow. Use this method for image formats or extended image features not supported by MuPDF. Examples are
+
+      * Formats JPEG, JPX, J2K, WebP, etc.
+      * Storing EXIF or dpi information.
+      * If you do not provide dpi information, the values stored with the pixmap are automatically used.
+
+      A simple example: ``pix.pillowWrite("some.jpg", optimize=True, dpi=(150,150))``. For details on possible parameters see the Pillow documentation.
+
+   ..  method:: pillowData(*args, **kwargs)
+
+      Return the pixmap as a bytes object in the specified format using Pillow. For example ``stream = pix.pillowData(format="JPEG", optimize=True)``. For details on possible parameters see the Pillow documentation.
+
+
+   .. attribute:: alpha
+
+      Indicates whether the pixmap contains transparency information.
+
+      :type: bool
+
+   .. attribute:: colorspace
+
+      The colorspace of the pixmap. This value may be *None* if the image is to be treated as a so-called *image mask* or *stencil mask* (currently happens for extracted PDF document images only).
+
+      :type: :ref:`Colorspace`
+
+   .. attribute:: stride
+
+      Contains the length of one row of image data in :attr:`Pixmap.samples`. This is primarily used for calculation purposes. The following expressions are true:
+
+      * *len(samples) == height * stride*
+      * *width * n == stride*.
+
+      :type: int
+
+   .. attribute:: irect
+
+      Contains the :ref:`IRect` of the pixmap.
+
+      :type: :ref:`IRect`
+
+   .. attribute:: samples
+
+      The color and (if :attr:`Pixmap.alpha` is true) transparency values for all pixels. It is an area of *width * height * n* bytes. Each n bytes define one pixel. Each successive n bytes yield another pixel in scanline order. Subsequent scanlines follow each other with no padding. E.g. for an RGBA colorspace this means, *samples* is a sequence of bytes like *..., R, G, B, A, ...*, and the four byte values R, G, B, A define one pixel.
+
+      This area can be passed to other graphics libraries like PIL (Python Imaging Library) to do additional processing like saving the pixmap in other image formats.
+
+      .. note::
+         * The underlying data is a typically **large** memory area from which a *bytes* copy is made for this attribute: for example an RGB-rendered letter page has a samples size of almost 1.4 MB. So consider assigning a new variable if you repeatedly use it.
+         * Any changes to the underlying data are available only after again accessing this attribute.
+
+      :type: bytes
+
+   .. attribute:: size
+
+      Contains *len(pixmap)*. This will generally equal *len(pix.samples)* plus some platform-specific value for defining other attributes of the object.
+
+      :type: int
+
+   .. attribute:: width
+
+   .. attribute:: w
+
+      Width of the region in pixels.
+
+      :type: int
+
+   .. attribute:: height
+
+   .. attribute:: h
+
+      Height of the region in pixels.
+
+      :type: int
+
+   .. attribute:: x
+
+      X-coordinate of top-left corner
+
+      :type: int
+
+   .. attribute:: y
+
+      Y-coordinate of top-left corner
+
+      :type: int
+
+   .. attribute:: n
+
+      Number of components per pixel. This number depends on colorspace and alpha. If colorspace is not *None* (stencil masks), then *Pixmap.n - Pixmap.aslpha == pixmap.colorspace.n* is true. If colorspace is *None*, then *n == alpha == 1*.
+
+      :type: int
+
+   .. attribute:: xres
+
+      Horizontal resolution in dpi (dots per inch). Please also see :data:`resolution`.
+
+      :type: int
+
+   .. attribute:: yres
+
+      Vertical resolution in dpi. Please also see :data:`resolution`.
+
+      :type: int
+
+   .. attribute:: interpolate
+
+      An information-only boolean flag set to *True* if the image will be drawn using "linear interpolation". If *False* "nearest neighbour sampling" will be used.
+
+      :type: bool
+
+.. _ImageFiles:
+
+Supported Input Image Formats
+-----------------------------------------------
+The following file types are supported as **input** to construct pixmaps: **BMP, JPEG, GIF, TIFF, JXR, JPX**, **PNG**, **PAM** and all of the **Portable Anymap** family (**PBM, PGM, PNM, PPM**). This support is two-fold:
+
+1. Directly create a pixmap with *Pixmap(filename)* or *Pixmap(byterray)*. The pixmap will then have properties as determined by the image.
+
+2. Open such files with *fitz.open(...)*. The result will then appear as a document containing one single page. Creating a pixmap of this page offers all the options available in this context: apply a matrix, choose colorspace and alpha, confine the pixmap to a clip area, etc.
+
+**SVG images** are only supported via method 2 above, not directly as pixmaps. But remember: the result of this is a **raster image** as is always the case with pixmaps [#f1]_.
+
+.. _PixmapOutput:
+
+Supported Output Image Formats
+---------------------------------------------------------------------------
+A number of image **output** formats are supported. You have the option to either write an image directly to a file (:meth:`Pixmap.writeImage`), or to generate a bytes object (:meth:`Pixmap.getImageData`). Both methods accept a 3-letter string identifying the desired format (**Format** column below). Please note that not all combinations of pixmap colorspace, transparency support (alpha) and image format are possible.
+
+========== =============== ========= ============== ===========================
+**Format** **Colorspaces** **alpha** **Extensions** **Description**
+========== =============== ========= ============== ===========================
+pam        gray, rgb, cmyk yes       .pam           Portable Arbitrary Map
+pbm        gray, rgb       no        .pbm           Portable Bitmap
+pgm        gray, rgb       no        .pgm           Portable Graymap
+png        gray, rgb       yes       .png           Portable Network Graphics
+pnm        gray, rgb       no        .pnm           Portable Anymap
+ppm        gray, rgb       no        .ppm           Portable Pixmap
+ps         gray, rgb, cmyk no        .ps            Adobe PostScript Image
+psd        gray, rgb, cmyk yes       .psd           Adobe Photoshop Document
+========== =============== ========= ============== ===========================
+
+.. note::
+    * Not all image file types are supported (or at least common) on all OS platforms. E.g. PAM and the Portable Anymap formats are rare or even unknown on Windows.
+    * Especially pertaining to CMYK colorspaces, you can always convert a CMYK pixmap to an RGB pixmap with *rgb_pix = fitz.Pixmap(fitz.csRGB, cmyk_pix)* and then save that in the desired format.
+    * As can be seen, MuPDF's image support range is different for input and output. Among those supported both ways, PNG is probably the most popular. We recommend using Pillow whenever you face a support gap.
+    * We also recommend using "ppm" formats as input to tkinter's *PhotoImage* method like this: *tkimg = tkinter.PhotoImage(data=pix.getImageData("ppm"))* (also see the tutorial). This is **very** fast (**60 times** faster than PNG) and will work under Python 2 or 3.
+
+
+
+.. rubric:: Footnotes
+
+.. [#f1] If you need a **vector image** from the SVG, you must first convert it to a PDF. Try :meth:`Document.convertToPDF`. If this is not not good enough, look for other SVG-to-PDF conversion tools like the Python packages `svglib <https://pypi.org/project/svglib>`_, `CairoSVG <https://pypi.org/project/cairosvg>`_, `Uniconvertor <https://sk1project.net/modules.php?name=Products&product=uniconvertor&op=download>`_ or the Java solution `Apache Batik <https://github.com/apache/batik>`_. Have a look at our Wiki for more examples.
+
+.. [#f2] To also set the alpha property, add an additional step to this method by dropping or adding an alpha channel to the result.
diff --git a/docs/point.rst b/docs/point.rst

new file mode 100644 (file)

index 0000000..aa1be6d
--- /dev/null
+++ b/docs/point.rst
@@ -0,0 +1,102 @@
+.. _Point:
+
+================
+Point
+================
+
+*Point* represents a point in the plane, defined by its x and y coordinates.
+
+============================ ============================================
+**Attribute / Method**       **Description**
+============================ ============================================
+:meth:`Point.distance_to`    calculate distance to point or rect
+:meth:`Point.norm`           the Euclidean norm
+:meth:`Point.transform`      transform point with a matrix
+:attr:`Point.abs_unit`       same as unit, but positive coordinates
+:attr:`Point.unit`           point coordinates divided by *abs(point)*
+:attr:`Point.x`              the X-coordinate
+:attr:`Point.y`              the Y-coordinate
+============================ ============================================
+
+**Class API**
+
+.. class:: Point
+
+   .. method:: __init__(self)
+
+   .. method:: __init__(self, x, y)
+
+   .. method:: __init__(self, point)
+
+   .. method:: __init__(self, sequence)
+
+      Overloaded constructors.
+
+      Without parameters, *Point(0, 0)* will be created.
+
+      With another point specified, a **new copy** will be crated, "sequence" is a Python sequence of 2 numbers (see :ref:`SequenceTypes`).
+
+     :arg float x: x coordinate of the point
+
+     :arg float y: y coordinate of the point
+
+   .. method:: distance_to(x [, unit])
+
+      Calculate the distance to *x*, which may be :data:`point_like` or :data:`rect_like`. The distance is given in units of either pixels (default), inches, centimeters or millimeters.
+
+     :arg point_like,rect_like x: to which to compute the distance.
+
+     :arg str unit: the unit to be measured in. One of "px", "in", "cm", "mm".
+
+     :rtype: float
+     :returns: the distance to *x*. If this is :data:`rect_like`, then the distance
+
+         * is the length of the shortest line connecting to one of the rectangle sides
+         * is calculated to the **finite version** of it
+         * is zero if it **contains** the point
+
+   .. method:: norm()
+
+      *(New in version 1.16.0)*
+      
+      Return the Euclidean norm (the length) of the point as a vector. Equals result of function *abs()*.
+
+   .. method:: transform(m)
+
+      Apply a matrix to the point and replace it with the result.
+
+     :arg matrix_like m: The matrix to be applied.
+
+     :rtype: :ref:`Point`
+
+   .. attribute:: unit
+
+      Result of dividing each coordinate by *norm(point)*, the distance of the point to (0,0). This is a vector of length 1 pointing in the same direction as the point does. Its x, resp. y values are equal to the cosine, resp. sine of the angle this vector (and the point itself) has with the x axis.
+
+      .. image:: images/img-point-unit.jpg
+
+      :type: :ref:`Point`
+
+   .. attribute:: abs_unit
+
+      Same as :attr:`unit` above, replacing the coordinates with their absolute values.
+
+      :type: :ref:`Point`
+
+   .. attribute:: x
+
+      The x coordinate
+
+      :type: float
+
+   .. attribute:: y
+
+      The y coordinate
+
+      :type: float
+
+.. note::
+
+   * This class adheres to the Python sequence protocol, so components can be accessed via their index, too. Also refer to :ref:`SequenceTypes`.
+   * Rectangles can be used with arithmetic operators -- see chapter :ref:`Algebra`.
+
diff --git a/docs/pymupdf-logo.jpg b/docs/pymupdf-logo.jpg

new file mode 100644 (file)

index 0000000..560e6f1

Binary files /dev/null and b/docs/pymupdf-logo.jpg differ
diff --git a/docs/quad.rst b/docs/quad.rst

new file mode 100644 (file)

index 0000000..bdbf3a1
--- /dev/null
+++ b/docs/quad.rst
@@ -0,0 +1,143 @@
+.. _Quad:
+
+==========
+Quad
+==========
+
+Represents a four-sided mathematical shape (also called "quadrilateral" or "tetragon") in the plane, defined as a sequence of four :ref:`Point` objects ul, ur, ll, lr (conveniently called upper left, upper right, lower left, lower right).
+
+Quads can **be obtained** as results of text search methods (:meth:`Page.searchFor`), and they **are used** to define text marker annotations (see e.g. :meth:`Page.addSquigglyAnnot` and friends), and in several draw methods (like :meth:`Page.drawQuad` / :meth:`Shape.drawQuad`, :meth:`Page.drawOval`/ :meth`Shape.drawQuad`).
+
+.. note::
+
+   * If the corners of a rectangle are transformed with a **rotation**, **scale** or **translation** :ref:`Matrix`, then the resulting quad is **rectangular**, i.e. its corners again enclose angles of 90 degrees. Property :attr:`Quad.isRectangular` checks whether a quad can be thought of being the result of such an operation. This is not true for all matrices: e.g. shear matrices produce parallelograms, and non-invertible matrices deliver "degenerate" tetragons like triangles or lines.
+
+   * Attribute :attr:`Quad.rect` obtains the envelopping rectangle. Vice versa, rectangles now have attributes :attr:`Rect.quad`, resp. :attr:`IRect.quad` to obtain their respective tetragon versions.
+
+
+============================= =======================================================
+**Methods / Attributes**      **Short Description**
+============================= =======================================================
+:meth:`Quad.transform`        transform with a matrix
+:meth:`Quad.morph`            transform with a point and matrix
+:attr:`Quad.ul`               upper left point
+:attr:`Quad.ur`               upper right point
+:attr:`Quad.ll`               lower left point
+:attr:`Quad.lr`               lower right point
+:attr:`Quad.isConvex`         true if quad is a convex set
+:attr:`Quad.isEmpty`          true if quad is an empty set
+:attr:`Quad.isRectangular`    true if quad is a (rotated) rectangle
+:attr:`Quad.rect`             smallest containing :ref:`Rect`
+:attr:`Quad.width`            the longest width value
+:attr:`Quad.height`           the longest height value
+============================= =======================================================
+
+**Class API**
+
+.. class:: Quad
+
+   .. method:: __init__(self)
+
+   .. method:: __init__(self, ul, ur, ll, lr)
+
+   .. method:: __init__(self, quad)
+
+   .. method:: __init__(self, sequence)
+
+      Overloaded constructors: "ul", "ur", "ll", "lr" stand for :data:`point_like` objects (the four corners), "sequence" is a Python sequence with four :data:`point_like` objects.
+
+      If "quad" is specified, the constructor creates a **new copy** of it.
+
+      Without parameters, a quad consisting of 4 copies of *Point(0, 0)* is created.
+
+
+   .. method:: transform(matrix)
+
+      Modify the quadrilateral by transforming each of its corners with a matrix.
+
+      :arg matrix_like matrix: the matrix.
+
+   .. method:: morph(fixpoint, matrix)
+
+      *(New in version 1.17.0)* "Morph" the quad with a matrix-like using a point-like as fixed point.
+
+      :arg point_like fixpoint: the point.
+      :arg matrix_like matrix: the matrix.
+      :returns: a new quad. The effect is achieved by using the following code::
+
+         >>> T = fitz.Matrix(1, 1).preTranslate(fixpoint.x, fixpoint.y)
+         >>> result = self * ~T * matrix * T
+
+      So the quad is translated such, that fixpoint becomes the origin (0, 0), then the matrix is applied to it, and finally a reverse translation is done.
+
+      Typical uses include rotating the quad around a desired point.
+
+   .. attribute:: rect
+
+      The smallest rectangle containing the quad, represented by the blue area in the following picture.
+
+      .. image:: images/img-quads.jpg
+
+      :type: :ref:`Rect`
+
+   .. attribute:: ul
+
+      Upper left point.
+
+      :type: :ref:`Point`
+
+   .. attribute:: ur
+
+      Upper right point.
+
+      :type: :ref:`Point`
+
+   .. attribute:: ll
+
+      Lower left point.
+
+      :type: :ref:`Point`
+
+   .. attribute:: lr
+
+      Lower right point.
+
+      :type: :ref:`Point`
+
+   .. attribute:: isConvex
+
+      *(New in version 1.16.1)*
+      
+      True if every line connecting two points of the quad is inside the quad. We in addition also make sure here, that the quad is not "degenerate", i.e. not all corners are on the same line (which would still qualify as convexity in the mathematical sense).
+
+      :type: bool
+
+   .. attribute:: isEmpty
+
+      True if enclosed area is zero, which means that at least three of the four corners are on the same line. If this is false, the quad may still be degenerate or not look like a tetragon at all (triangles, parallelograms, trapezoids, ...).
+
+      :type: bool
+
+   .. attribute:: isRectangular
+
+      True if all corner angles are 90 degrees. This implies that the quad is **convex and not empty**.
+
+      :type: bool
+
+   .. attribute:: width
+
+      The maximum length of the top and the bottom side.
+
+      :type: float
+
+   .. attribute:: height
+
+      The maximum length of the left and the right side.
+
+      :type: float
+
+Remark
+------
+This class adheres to the sequence protocol, so components can be dealt with via their indices, too. Also refer to :ref:`SequenceTypes`.
+
+We are still in process to extend algebraic operations to quads. Multiplication and division with / by numbers and matrices are already defined. Addition, subtraction and any unary operations may follow when we see an actual need.
diff --git a/docs/rect.rst b/docs/rect.rst

new file mode 100644 (file)

index 0000000..381b4d9
--- /dev/null
+++ b/docs/rect.rst
@@ -0,0 +1,270 @@
+.. _Rect:
+
+==========
+Rect
+==========
+
+*Rect* represents a rectangle defined by four floating point numbers x0, y0, x1, y1. They are treated as being coordinates of two diagonally opposite points. The first two numbers are regarded as the "top left" corner P\ :sub:`x0,y0` and P\ :sub:`x1,y1` as the "bottom right" one. However, these two properties need not coincide with their intuitive meanings -- read on.
+
+The following remarks are also valid for :ref:`IRect` objects:
+
+* Rectangle borders are always parallel to the respective X- and Y-axes.
+* The constructing points can be anywhere in the plane -- they need not even be different, and e.g. "top left" need not be the geometrical "north-western" point.
+* For any given quadruple of numbers, the geometrically "same" rectangle can be defined in (up to) four different ways: Rect(P\ :sub:`x0,y0`, P\ :sub:`x1,y1`\ ), Rect(P\ :sub:`x1,y1`, P\ :sub:`x0,y0`\ ), Rect(P\ :sub:`x0,y1`, P\ :sub:`x1,y0`\ ), and Rect(P\ :sub:`x1,y0`, P\ :sub:`x0,y1`\ ).
+
+Hence some useful classification:
+
+* A rectangle is called **finite** if *x0 <= x1* and *y0 <= y1* (i.e. the bottom right point is "south-eastern" to the top left one), otherwise **infinite**. Of the four alternatives above, **only one** is finite (disregarding degenerate cases). Please take into account, that in MuPDF's coordinate system the y-axis is oriented from **top to bottom**.
+
+* A rectangle is called **empty** if *x0 = x1* or *y0 = y1*, i.e. if its area is zero.
+
+.. note:: It sounds like a paradox: a rectangle can be both, infinite **and** empty ...
+
+============================= =======================================================
+**Methods / Attributes**      **Short Description**
+============================= =======================================================
+:meth:`Rect.contains`         checks containment of another object
+:meth:`Rect.getArea`          calculate rectangle area
+:meth:`Rect.getRectArea`      calculate rectangle area
+:meth:`Rect.includePoint`     enlarge rectangle to also contain a point
+:meth:`Rect.includeRect`      enlarge rectangle to also contain another one
+:meth:`Rect.intersect`        common part with another rectangle
+:meth:`Rect.intersects`       checks for non-empty intersections
+:meth:`Rect.morph`            transform with a point and a matrix
+:meth:`Rect.norm`             the Euclidean norm
+:meth:`Rect.normalize`        makes a rectangle finite
+:meth:`Rect.round`            create smallest :ref:`Irect` containing rectangle
+:meth:`Rect.transform`        transform rectangle with a matrix
+:attr:`Rect.bottom_left`      bottom left point, synonym *bl*
+:attr:`Rect.bottom_right`     bottom right point, synonym *br*
+:attr:`Rect.height`           rectangle height
+:attr:`Rect.irect`            equals result of method *round()*
+:attr:`Rect.isEmpty`          whether rectangle is empty
+:attr:`Rect.isInfinite`       whether rectangle is infinite
+:attr:`Rect.top_left`         top left point, synonym *tl*
+:attr:`Rect.top_right`        top_right point, synonym *tr*
+:attr:`Rect.quad`             :ref:`Quad` made from rectangle corners
+:attr:`Rect.width`            rectangle width
+:attr:`Rect.x0`               top left corner's X-coordinate
+:attr:`Rect.x1`               bottom right corner's X-coordinate
+:attr:`Rect.y0`               top left corner's Y-coordinate
+:attr:`Rect.y1`               bottom right corner's Y-coordinate
+============================= =======================================================
+
+**Class API**
+
+.. class:: Rect
+
+   .. method:: __init__(self)
+
+   .. method:: __init__(self, x0, y0, x1, y1)
+
+   .. method:: __init__(self, top_left, bottom_right)
+
+   .. method:: __init__(self, top_left, x1, y1)
+
+   .. method:: __init__(self, x0, y0, bottom_right)
+
+   .. method:: __init__(self, rect)
+
+   .. method:: __init__(self, sequence)
+
+      Overloaded constructors: *top_left*, *bottom_right* stand for :data:`point_like` objects, "sequence" is a Python sequence type of 4 numbers (see :ref:`SequenceTypes`), "rect" means another :data:`rect_like`, while the other parameters mean coordinates.
+
+      If "rect" is specified, the constructor creates a **new copy** of it.
+
+      Without parameters, the empty rectangle *Rect(0.0, 0.0, 0.0, 0.0)* is created.
+
+   .. method:: round()
+
+      Creates the smallest containing :ref:`IRect`, This is **not** the same as simply rounding the rectangle's edges: The top left corner is rounded upwards and left while the bottom right corner is rounded downwards and to the right.
+
+      >>> fitz.Rect(0.5, -0.01, 123.88, 455.123456).round()
+      IRect(0, -1, 124, 456)
+
+      1. If the rectangle is **infinite**, the "normalized" (finite) version of it will be taken. The result of this method is always a finite *IRect*.
+      2. If the rectangle is **empty**, the result is also empty.
+      3. **Possible paradox:** The result may be empty, **even if** the rectangle is **not** empty! In such cases, the result obviously does **not** contain the rectangle. This is because MuPDF's algorithm allows for a small tolerance (1e-3). Example:
+
+      >>> r = fitz.Rect(100, 100, 200, 100.001)
+      >>> r.isEmpty  # rect is NOT empty
+      False
+      >>> r.round()  # but its irect IS empty!
+      fitz.IRect(100, 100, 200, 100)
+      >>> r.round().isEmpty
+      True
+
+      :rtype: :ref:`IRect`
+
+   .. method:: transform(m)
+
+      Transforms the rectangle with a matrix and **replaces the original**. If the rectangle is empty or infinite, this is a no-operation.
+
+      :arg m: The matrix for the transformation.
+      :type m: :ref:`Matrix`
+
+      :rtype: *Rect*
+      :returns: the smallest rectangle that contains the transformed original.
+
+   .. method:: intersect(r)
+
+      The intersection (common rectangular area) of the current rectangle and *r* is calculated and **replaces the current** rectangle. If either rectangle is empty, the result is also empty. If *r* is infinite, this is a no-operation.
+
+      :arg r: Second rectangle
+      :type r: :ref:`Rect`
+
+   .. method:: includeRect(r)
+
+      The smallest rectangle containing the current one and *r* is calculated and **replaces the current** one. If either rectangle is infinite, the result is also infinite. If one is empty, the other one will be taken as the result.
+
+      :arg r: Second rectangle
+      :type r: :ref:`Rect`
+
+   .. method:: includePoint(p)
+
+      The smallest rectangle containing the current one and point *p* is calculated and **replaces the current** one. **Infinite rectangles remain unchanged.** To create a rectangle containing a series of points, start with (the empty) *fitz.Rect(p1, p1)* and successively perform *includePoint* operations for the other points.
+
+      :arg p: Point to include.
+      :type p: :ref:`Point`
+
+   .. method:: getRectArea([unit])
+
+   .. method:: getArea([unit])
+
+      Calculate the area of the rectangle and, with no parameter, equals *abs(rect)*. Like an empty rectangle, the area of an infinite rectangle is also zero. So, at least one of *fitz.Rect(p1, p2)* and *fitz.Rect(p2, p1)* has a zero area.
+
+      :arg str unit: Specify required unit: respective squares of *px* (pixels, default), *in* (inches), *cm* (centimeters), or *mm* (millimeters).
+      :rtype: float
+
+   .. method:: contains(x)
+
+      Checks whether *x* is contained in the rectangle. It may be an *IRect*, *Rect*, *Point* or number. If *x* is an empty rectangle, this is always true. If the rectangle is empty this is always *False* for all non-empty rectangles and for all points. If *x* is a number, it will be checked against the four components. *x in rect* and *rect.contains(x)* are equivalent.
+
+      :arg x: the object to check.
+      :type x: :ref:`IRect` or :ref:`Rect` or :ref:`Point` or number
+
+      :rtype: bool
+
+   .. method:: intersects(r)
+
+      Checks whether the rectangle and a :data:`rect_like` "r" contain a common non-empty :ref:`Rect`. This will always be *False* if either is infinite or empty.
+
+      :arg rect_like r: the rectangle to check.
+
+      :rtype: bool
+
+   .. method:: morph(fixpoint, matrix)
+
+      *(New in version 1.17.0)*
+      
+      Return a new quad after applying a matrix to it using a fixed point.
+
+      :arg point_like fixpoint: the fixed point.
+      :arg matrix_like matrix: the matrix.
+      :returns: a new :ref:`Quad`. This a wrapper for the same-named quad method.
+
+   .. method:: norm()
+
+      *(New in version 1.16.0)*
+      
+      Return the Euclidean norm of the rectangle treated as a vector of four numbers.
+
+   .. method:: normalize()
+
+      **Replace** the rectangle with its finite version. This is done by shuffling the rectangle corners. After completion of this method, the bottom right corner will indeed be south-eastern to the top left one.
+
+   .. attribute:: irect
+
+      Equals result of method *round()*.
+
+   .. attribute:: top_left
+
+   .. attribute:: tl
+
+      Equals *Point(x0, y0)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: top_right
+
+   .. attribute:: tr
+
+      Equals *Point(x1, y0)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: bottom_left
+
+   .. attribute:: bl
+
+      Equals *Point(x0, y1)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: bottom_right
+
+   .. attribute:: br
+
+      Equals *Point(x1, y1)*.
+
+      :type: :ref:`Point`
+
+   .. attribute:: quad
+
+      The quadrilateral *Quad(rect.tl, rect.tr, rect.bl, rect.br)*.
+
+      :type: :ref:`Quad`
+
+   .. attribute:: width
+
+      Width of the rectangle. Equals *abs(x1 - x0)*.
+
+      :rtype: float
+
+   .. attribute:: height
+
+      Height of the rectangle. Equals *abs(y1 - y0)*.
+
+      :rtype: float
+
+   .. attribute:: x0
+
+      X-coordinate of the left corners.
+
+      :type: float
+
+   .. attribute:: y0
+
+      Y-coordinate of the top corners.
+
+      :type: float
+
+   .. attribute:: x1
+
+      X-coordinate of the right corners.
+
+      :type: float
+
+   .. attribute:: y1
+
+      Y-coordinate of the bottom corners.
+
+      :type: float
+
+   .. attribute:: isInfinite
+
+      *True* if rectangle is infinite, *False* otherwise.
+
+      :type: bool
+
+   .. attribute:: isEmpty
+
+      *True* if rectangle is empty, *False* otherwise.
+
+      :type: bool
+
+.. note::
+
+   * This class adheres to the Python sequence protocol, so components can be accessed via their index, too. Also refer to :ref:`SequenceTypes`.
+   * Rectangles can be used with arithmetic operators -- see chapter :ref:`Algebra`.
+
diff --git a/docs/replace-fonts.py b/docs/replace-fonts.py

new file mode 100644 (file)

index 0000000..6c93b10
--- /dev/null
+++ b/docs/replace-fonts.py
@@ -0,0 +1,105 @@
+"""
+Demo / Experimental: Replace the fonts in a PDF.
+
+"""
+import fitz
+import sys
+
+fname = sys.argv[1]
+
+doc = fitz.open(fname)  # input PDF
+out = fitz.open()  # output PDF
+csv = open("fonts.csv").read().splitlines()
+all_fonts = []  # will contain: (old basefont name, Base14 name)
+for f in csv:
+    all_fonts.append(f.split(";"))
+
+
+def pdf_color(srgb):
+    """Create a PDF color triple from a given sRGB color integer.
+    """
+    b = (srgb % 256) / 255
+    srgb /= 256
+    g = (srgb % 256) / 255
+    srgb /= 256
+    r = srgb / 255
+    return (r, g, b)
+
+
+def get_font(fontname):
+    """Lookup base fontname and return one of the "reserved" Base14 fontnames.
+    """
+    for f in all_fonts:
+        if f[0] in fontname:  # fontname may look like "ABCDEF+fontname..."
+            return f[1]
+    return "helv"  # default: Helvetica
+
+
+for page in doc:
+    if page.number % 10 == 0:  # just entertainment messages every 10 pages
+        print("Processed %i pages" % page.number)
+    if not page._isWrapped:  # check if input page geometry is dubious
+        page._wrapContents()
+    # for each input page create an output with same dimensions
+    outpage = out.newPage(width=page.rect.width, height=page.rect.height)
+
+    # create a shape to write the output text to.
+    shape = outpage.newShape()
+    text_blocks = []
+    image_blocks = []
+    for block in page.getText("dict")["blocks"]:
+        if block["type"] == 0:
+            text_blocks.append(block)
+        else:
+            image_blocks.append(block)
+
+    # insert the images first, so any text appears in foreground
+    for block in image_blocks:
+        outpage.insertImage(block["bbox"], stream=block["image"])
+        print("Inserted an image on page", page.number)
+
+    for block in text_blocks:  # read text blocks
+        shape.drawRect(block["bbox"])  # draw all text on white background,
+        # because images may cover same area
+
+        for line in block["lines"]:  # for each line in the block ...
+            for span in line["spans"]:  # for each span in the line ...
+                fontname = get_font(span["font"])  # get replacing fontname
+                fontsize = span["size"]
+                text = span["text"]
+                bbox = fitz.Rect(span["bbox"])  # text rectangle on input
+                text_size = fitz.getTextlength(  # measure text length on output
+                    text, fontname=fontname, fontsize=fontsize
+                )
+
+                # adjust fontsize if text is too long with new the font
+                if text_size > bbox.width:
+                    fontsize *= bbox.width / text_size
+                try:
+                    shape.insertText(  # copy text to output page
+                        bbox.bl,  # insertion point on output page
+                        text,  # the text to insert
+                        fontsize=fontsize,  # fontsize
+                        # decide on output font here: the place for sophistication!
+                        fontname=fontname,
+                        color=pdf_color(span["color"]),
+                    )
+                except ValueError:
+                    print("Method 'insertText' failed:")
+                    print(
+                        "page:",
+                        page.number,
+                        "at",
+                        span["bbox"][:2],
+                        "text:",
+                        span["text"],
+                    )
+        shape.finish(color=None, fill=(1, 1, 1))  # white for the text background
+    shape.commit()  # write everything to the output page
+
+"""
+Several other features can be added, like:
+- copy over the input metadata dictionary
+- copy over the input table of contents
+"""
+out.save("new-" + fname, deflate=True, garbage=4)
diff --git a/docs/shape.rst b/docs/shape.rst

new file mode 100644 (file)

index 0000000..e2bac93
--- /dev/null
+++ b/docs/shape.rst
@@ -0,0 +1,566 @@
+.. _Shape:
+
+Shape
+================
+
+This class allows creating interconnected graphical elements on a PDF page. Its methods have the same meaning and name as the corresponding :ref:`Page` methods.
+
+In fact, each :ref:`Page` draw method is just a convenience wrapper for (1) one shape draw method, (2) the :meth:`finish` method, and (3) the :meth:`commit` method. For page text insertion, only the :meth:`commit` method is invoked. If many draw and text operations are executed for a page, you should always consider using a Shape object.
+
+Several draw methods can be executed in a row and each one of them will contribute to one drawing. Once the drawing is complete, the :meth:`finish` method must be invoked to apply color, dashing, width, morphing and other attributes.
+
+**Draw** methods of this class (and :meth:`insertTextbox`) are logging the area they are covering in a rectangle (:attr:`Shape.rect`). This property can for instance be used to set :attr:`Page.CropBox`.
+
+**Text insertions** :meth:`insertText` and :meth:`insertTextbox` implicitely execute a "finish" and therefore only require :meth:`commit` to become effective. As a consequence, both include parameters for controlling prperties like colors, etc.
+
+================================ =====================================================
+**Method / Attribute**             **Description**
+================================ =====================================================
+:meth:`Shape.commit`             update the page's contents
+:meth:`Shape.drawBezier`         draw a cubic Bezier curve
+:meth:`Shape.drawCircle`         draw a circle around a point
+:meth:`Shape.drawCurve`          draw a cubic Bezier using one helper point
+:meth:`Shape.drawLine`           draw a line
+:meth:`Shape.drawOval`           draw an ellipse
+:meth:`Shape.drawPolyline`       connect a sequence of points
+:meth:`Shape.drawQuad`           draw a quadrilateral
+:meth:`Shape.drawRect`           draw a rectangle
+:meth:`Shape.drawSector`         draw a circular sector or piece of pie
+:meth:`Shape.drawSquiggle`       draw a squiggly line
+:meth:`Shape.drawZigzag`         draw a zigzag line
+:meth:`Shape.finish`             finish a set of draw commands
+:meth:`Shape.insertText`         insert text lines
+:meth:`Shape.insertTextbox`      fit text into a rectangle
+:attr:`Shape.doc`                stores the page's document
+:attr:`Shape.draw_cont`          draw commands since last *finish()*
+:attr:`Shape.height`             stores the page's height
+:attr:`Shape.lastPoint`          stores the current point
+:attr:`Shape.page`               stores the owning page
+:attr:`Shape.rect`               rectangle surrounding drawings
+:attr:`Shape.text_cont`          accumulated text insertions
+:attr:`Shape.totalcont`          accumulated string to be stored in :data:`contents`
+:attr:`Shape.width`              stores the page's width
+================================ =====================================================
+
+**Class API**
+
+.. class:: Shape
+
+   .. method:: __init__(self, page)
+
+      Create a new drawing. During importing PyMuPDF, the *fitz.Page* object is being given the convenience method *newShape()* to construct a *Shape* object. During instantiation, a check will be made whether we do have a PDF page. An exception is otherwise raised.
+
+      :arg page: an existing page of a PDF document.
+      :type page: :ref:`Page`
+
+   .. method:: drawLine(p1, p2)
+
+      Draw a line from :data:`point_like` objects *p1* to *p2*.
+
+      :arg point_like p1: starting point
+
+      :arg point_like p2: end point
+
+      :rtype: :ref:`Point`
+      :returns: the end point, *p2*.
+
+   .. index::
+      pair: breadth; drawSquiggle
+
+   .. method:: drawSquiggle(p1, p2, breadth=2)
+
+      Draw a squiggly (wavy, undulated) line from :data:`point_like` objects *p1* to *p2*. An integer number of full wave periods will always be drawn, one period having a length of *4 * breadth*. The breadth parameter will be adjusted as necessary to meet this condition. The drawn line will always turn "left" when leaving *p1* and always join *p2* from the "right".
+
+      :arg point_like p1: starting point
+
+      :arg point_like p2: end point
+
+      :arg float breadth: the amplitude of each wave. The condition *2 * breadth < abs(p2 - p1)* must be true to fit in at least one wave. See the following picture, which shows two points connected by one full period.
+
+      :rtype: :ref:`Point`
+      :returns: the end point, *p2*.
+
+      .. image:: images/img-breadth.png
+
+      Here is an example of three connected lines, forming a closed, filled triangle. Little arrows indicate the stroking direction.
+
+      .. image:: images/img-squiggly.png
+
+      .. note:: Waves drawn are **not** trigonometric (sine / cosine). If you need that, have a look at `draw-sines.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/draw-sines.py>`_.
+
+   .. index::
+      pair: breadth; drawZigzag
+
+   .. method:: drawZigzag(p1, p2, breadth=2)
+
+      Draw a zigzag line from :data:`point_like` objects *p1* to *p2*. An integer number of full zigzag periods will always be drawn, one period having a length of *4 * breadth*. The breadth parameter will be adjusted to meet this condition. The drawn line will always turn "left" when leaving *p1* and always join *p2* from the "right".
+
+      :arg point_like p1: starting point
+
+      :arg point_like p2: end point
+
+      :arg float breadth: the amplitude of the movement. The condition *2 * breadth < abs(p2 - p1)* must be true to fit in at least one period.
+
+      :rtype: :ref:`Point`
+      :returns: the end point, *p2*.
+
+   .. method:: drawPolyline(points)
+
+      Draw several connected lines between points contained in the sequence *points*. This can be used for creating arbitrary polygons by setting the last item equal to the first one.
+
+      :arg sequence points: a sequence of :data:`point_like` objects. Its length must at least be 2 (in which case it is equivalent to *drawLine()*).
+
+      :rtype: :ref:`Point`
+      :returns: *points[-1]* -- the last point in the argument sequence.
+
+   .. method:: drawBezier(p1, p2, p3, p4)
+
+      Draw a standard cubic Bezier curve from *p1* to *p4*, using *p2* and *p3* as control points.
+
+      All arguments are :data:`point_like` \s.
+
+      :rtype: :ref:`Point`
+      :returns: the end point, *p4*.
+
+      .. note:: The points do not need to be different -- experiment a bit with some of them being equal!
+
+      Example:
+
+      .. image:: images/img-drawBezier.png
+
+   .. method:: drawOval(tetra)
+
+      Draw an "ellipse" inside the given tetragon (quadrilateral). If it is a square, a regular circle is drawn, a general rectangle will result in an ellipse. If a quadrilateral is used instead, a plethora of shapes can be the result.
+
+      The drawing starts and ends at the middle point of the line connecting bottom-left and top-left corners in an anti-clockwise movement.
+
+      :arg rect_like,quad_like tetra: :data:`rect_like` or :data:`quad_like`.
+
+          *Changed in version 1.14.5:*  tetragons are now also supported.
+
+      :rtype: :ref:`Point`
+      :returns: the middle point of line from *rect.bl* to *rect.tl*, or from *quad.ll* to *quad.ul*, respectively. Look at just a few examples here, or at the *quad-show?.py* scripts in the PyMuPDF-Utilities repository.
+
+      .. image:: images/img-drawquad.jpg
+         :scale: 50
+
+   .. method:: drawCircle(center, radius)
+
+      Draw a circle given its center and radius. The drawing starts and ends at point *center - (radius, 0)* in an anti-clockwise movement. This corresponds to the middle point of the enclosing rectangle's left side.
+
+      The method is a shortcut for *drawSector(center, start, 360, fullSector=False)*. To draw a circle in a clockwise movement, change the sign of the degree.
+
+      :arg center: the center of the circle.
+      :type center: point_like
+
+      :arg float radius: the radius of the circle. Must be positive.
+
+      :rtype: :ref:`Point`
+      :returns: *center - (radius, 0)*.
+
+      .. image:: images/img-drawcircle.jpg
+         :scale: 60
+
+   .. method:: drawCurve(p1, p2, p3)
+
+      A special case of *drawBezier()*: Draw a cubic Bezier curve from *p1* to *p3*. On each of the two lines from *p1* to *p2* and from *p2* to *p3* one control point is generated. This guaranties that the curve's curvature does not change its sign. If these two connecting lines intersect with an angle of 90 degrees, then the resulting curve is a quarter ellipse (or quarter circle, if of same length) circumference.
+
+      All arguments are :data:`point_like`.
+
+      :rtype: :ref:`Point`
+      :returns: the end point, *p3*.
+
+      Example: a filled quarter ellipse segment.
+
+      .. image:: images/img-drawCurve.png
+
+   .. index::
+      pair: fullSector; drawSector
+
+   .. method:: drawSector(center, point, angle, fullSector=True)
+
+      Draw a circular sector, optionally connecting the arc to the circle's center (like a piece of pie).
+
+      :arg point_like center: the center of the circle.
+
+      :arg point_like point: one of the two end points of the pie's arc segment. The other one is calculated from the *angle*.
+
+      :arg float angle: the angle of the sector in degrees. Used to calculate the other end point of the arc. Depending on its sign, the arc is drawn anti-clockwise (postive) or clockwise.
+
+      :arg bool fullSector: whether to draw connecting lines from the ends of the arc to the circle center. If a fill color is specified, the full "pie" is colored, otherwise just the sector.
+
+      :returns: the other end point of the arc. Can be used as starting point for a following invocation to create logically connected pies charts.
+      :rtype: :ref:`Point`
+
+      Examples:
+
+      .. image:: images/img-drawSector1.png
+
+      .. image:: images/img-drawSector2.png
+
+
+   .. method:: drawRect(rect)
+
+      Draw a rectangle. The drawing starts and ends at the top-left corner in an anti-clockwise movement.
+
+      :arg rect_like rect: where to put the rectangle on the page.
+
+      :rtype: :ref:`Point`
+      :returns: top-left corner of the rectangle.
+
+   .. method:: drawQuad(quad)
+
+      Draw a quadrilateral. The drawing starts and ends at the top-left corner (:attr:`Quad.ul`) in an anti-clockwise movement. It invokes :meth:`drawPolyline` with the argument *[ul, ll, lr, ur, ul]*.
+
+      :arg quad_like quad: where to put the tetragon on the page.
+
+      :rtype: :ref:`Point`
+      :returns: :attr:`Quad.ul`.
+
+   .. index::
+      pair: border_width; insertText
+      pair: color; insertText
+      pair: encoding; insertText
+      pair: fill; insertText
+      pair: fontfile; insertText
+      pair: fontname; insertText
+      pair: fontsize; insertText
+      pair: morph; insertText
+      pair: render_mode; insertText
+      pair: rotate; insertText
+
+   .. method:: insertText(point, text, fontsize=11, fontname="helv", fontfile=None, set_simple=False, encoding=TEXT_ENCODING_LATIN, color=None, fill=None, render_mode=0, border_width=1, rotate=0, morph=None)
+
+      Insert text lines start at *point*.
+
+      :arg point_like point: the bottom-left position of the first character of *text* in pixels. It is important to understand, how this works in conjunction with the *rotate* parameter. Please have a look at the following picture. The small red dots indicate the positions of *point* in each of the four possible cases.
+
+         .. image:: images/img-inserttext.jpg
+            :scale: 33
+
+      :arg str/sequence text: the text to be inserted. May be specified as either a string type or as a sequence type. For sequences, or strings containing line breaks *\n*, several lines will be inserted. No care will be taken if lines are too wide, but the number of inserted lines will be limited by "vertical" space on the page (in the sense of reading direction as established by the *rotate* parameter). Any rest of *text* is discarded -- the return code however contains the number of inserted lines.
+
+      :arg int rotate: determines whether to rotate the text. Acceptable values are multiples of 90 degrees. Default is 0 (no rotation), meaning horizontal text lines oriented from left to right. 180 means text is shown upside down from **right to left**. 90 means anti-clockwise rotation, text running **upwards**. 270 (or -90) means clockwise rotation, text running **downwards**. In any case, *point* specifies the bottom-left coordinates of the first character's rectangle. Multiple lines, if present, always follow the reading direction established by this parameter. So line 2 is located **above** line 1 in case of *rotate = 180*, etc.
+
+      :rtype: int
+      :returns: number of lines inserted.
+
+      For a description of the other parameters see :ref:`CommonParms`.
+
+   .. index::
+      pair: align; insertTextbox
+      pair: border_width; insertTextbox
+      pair: color; insertTextbox
+      pair: encoding; insertTextbox
+      pair: expandtabs; insertTextbox
+      pair: fill; insertTextbox
+      pair: fontfile; insertTextbox
+      pair: fontname; insertTextbox
+      pair: fontsize; insertTextbox
+      pair: morph; insertTextbox
+      pair: render_mode; insertTextbox
+      pair: rotate; insertTextbox
+
+   .. method:: insertTextbox(rect, buffer, fontsize=11, fontname="helv", fontfile=None, set_simple=False, encoding=TEXT_ENCODING_LATIN, color=None, fill=None, render_mode=0, border_width=1, expandtabs=8, align=TEXT_ALIGN_LEFT, rotate=0, morph=None)
+
+      PDF only: Insert text into the specified rectangle. The text will be split into lines and words and then filled into the available space, starting from one of the four rectangle corners, which depends on *rotate*. Line feeds will be respected as well as multiple spaces will be.
+
+      :arg rect_like rect: the area to use. It must be finite and not empty.
+
+      :arg str/sequence buffer: the text to be inserted. Must be specified as a string or a sequence of strings. Line breaks are respected also when occurring in a sequence entry.
+
+      :arg int align: align each text line. Default is 0 (left). Centered, right and justified are the other supported options, see :ref:`TextAlign`. Please note that the effect of parameter value *TEXT_ALIGN_JUSTIFY* is only achievable with "simple" (single-byte) fonts (including the :ref:`Base-14-Fonts`). Refer to :ref:`AdobeManual`, section 5.2.2, page 399.
+
+      :arg int expandtabs: controls handling of tab characters *\t* using the *string.expandtabs()* method **per each line**.
+
+      :arg int rotate: requests text to be rotated in the rectangle. This value must be a multiple of 90 degrees. Default is 0 (no rotation). Effectively, four different values are processed: 0, 90, 180 and 270 (= -90), each causing the text to start in a different rectangle corner. Bottom-left is 90, bottom-right is 180, and -90 / 270 is top-right. See the example how text is filled in a rectangle. This argument takes precedence over morphing. See the second example, which shows text first rotated left by 90 degrees and then the whole rectangle rotated clockwise around is lower left corner.
+
+      :rtype: float
+      :returns:
+          **If positive or zero**: successful execution. The value returned is the unused rectangle line space in pixels. This may safely be ignored -- or be used to optimize the rectangle, position subsequent items, etc.
+
+          **If negative**: no execution. The value returned is the space deficit to store text lines. Enlarge rectangle, decrease *fontsize*, decrease text amount, etc.
+
+      .. image:: images/img-rotate.png
+
+      .. image:: images/img-rot+morph.png
+
+      For a description of the other parameters see :ref:`CommonParms`.
+
+   .. index::
+      pair: closePath; finish
+      pair: color; finish
+      pair: dashes; finish
+      pair: even_odd; finish
+      pair: fill; finish
+      pair: lineCap; finish
+      pair: lineJoin; finish
+      pair: morph; finish
+      pair: width; finish
+
+   .. method:: finish(width=1, color=None, fill=None, lineCap=0, lineJoin=0, dashes=None, closePath=True, even_odd=False, morph=(fixpoint, matrix))
+
+      Finish a set of *draw*()* methods by applying :ref:`CommonParms` to all of them. This method also supports morphing the resulting compound drawing using a fixpoint :ref:`Point`.
+
+      :arg sequence morph: morph the text or the compound drawing around some arbitrary :ref:`Point` *fixpoint* by applying :ref:`Matrix` *matrix* to it. This implies that *fixpoint* is a **fixed point** of this operation: it will not change its position. Default is no morphing (*None*). The matrix can contain any values in its first 4 components, *matrix.e == matrix.f == 0* must be true, however. This means that any combination of scaling, shearing, rotating, flipping, etc. is possible, but translations are not.
+
+      :arg bool even_odd: request the **"even-odd rule"** for filling operations. Default is *False*, so that the **"nonzero winding number rule"** is used. These rules are alternative methods to apply the fill color where areas overlap. Only with fairly complex shapes a different behavior is to be expected with these rules. For an in-depth explanation, see :ref:`AdobeManual`, pp. 232 ff. Here is an example to demonstrate the difference.
+
+      .. image:: images/img-even-odd.png
+
+      .. note:: For each pixel in a drawing the following will happen:
+
+         1. Rule **"even-odd"** counts, how many areas are overlapping at a pixel. If this count is **odd** the pixel is regarded **inside**, if it is **even**, the pixel is **outside**.
+
+         2. Default rule **"nonzero winding"** also looks at the orientation of overlapping areas: it **adds 1** if an area is drawn anit-clockwise and it **subtracts 1** for clockwise areas. If the result is zero, the pixel is regarded **outside**, pixels with a non-zero count are **inside**.
+
+         In the top two shapes, three circles are drawn in standard manner (anti-clockwise, look at the arrows). The lower two shapes contain one (top-left) circle drawn clockwise. As can be seen, area orientation is irrelevant for the even-odd rule.
+
+   .. index::
+      pair: overlay; commit
+
+   .. method:: commit(overlay=True)
+
+      Update the page's :data:`contents` with the accumulated draw commands and text insertions. If a *Shape* is not committed, the page will not be changed.
+
+      The method will reset attributes :attr:`Shape.rect`, :attr:`lastPoint`, :attr:`draw_cont`, :attr:`text_cont` and :attr:`totalcont`. Afterwards, the shape object can be reused for the **same page**.
+
+      :arg bool overlay: determine whether to put content in foreground (default) or background. Relevant only, if the page already has a non-empty :data:`contents` object.
+
+   .. attribute:: doc
+
+      For reference only: the page's document.
+
+      :type: :ref:`Document`
+
+   .. attribute:: page
+
+      For reference only: the owning page.
+
+      :type: :ref:`Page`
+
+   .. attribute:: height
+
+      Copy of the page's height
+
+      :type: float
+
+   .. attribute:: width
+
+      Copy of the page's width.
+
+      :type: float
+
+   .. attribute:: draw_cont
+
+      Accumulated command buffer for **draw methods** since last finish.
+
+      :type: str
+
+   .. attribute:: text_cont
+
+      Accumulated text buffer. All **text insertions** go here. On :meth:`commit` this buffer will be appended to :attr:`totalcont`, so that text will never be covered by drawings in the same Shape.
+
+      :type: str
+
+   .. attribute:: rect
+
+      Rectangle surrounding drawings. This attribute is at your disposal and may be changed at any time. Its value is set to *None* when a shape is created or committed. Every *draw** method, and :meth:`Shape.insertTextbox` update this property (i.e. **enlarge** the rectangle as needed). **Morphing** operations, however (:meth:`Shape.finish`, :meth:`Shape.insertTextbox`) are ignored.
+
+      A typical use of this attribute would be setting :attr:`Page.CropBox` to this value, when you are creating shapes for later or external use. If you have not manipulated the attribute yourself, it should reflect a rectangle that contains all drawings so far.
+
+      If you have used morphing and need a rectangle containing the morphed objects, use the following code::
+
+         >>> # assuming ...
+         >>> morph = (point, matrix)
+         >>> # ... recalculate the shape rectangle like so:
+         >>> shape.rect = (shape.rect - fitz.Rect(point, point)) * ~matrix + fitz.Rect(point, point)
+
+      :type: :ref:`Rect`
+
+   .. attribute:: totalcont
+
+      Total accumulated command buffer for draws and text insertions. This will be used by :meth:`Shape.commit`.
+
+      :type: str
+
+   .. attribute:: lastPoint
+
+      For reference only: the current point of the drawing path. It is *None* at *Shape* creation and after each *finish()* and *commit()*.
+
+      :type: :ref:`Point`
+
+Usage
+------
+A drawing object is constructed by *shape = page.newShape()*. After this, as many draw, finish and text insertions methods as required may follow. Each sequence of draws must be finished before the drawing is committed. The overall coding pattern looks like this::
+
+   >>> shape = page.newShape()
+   >>> shape.draw1(...)
+   >>> shape.draw2(...)
+   >>> ...
+   >>> shape.finish(width=..., color=..., fill=..., morph=...)
+   >>> shape.draw3(...)
+   >>> shape.draw4(...)
+   >>> ...
+   >>> shape.finish(width=..., color=..., fill=..., morph=...)
+   >>> ...
+   >>> shape.insertText*
+   >>> ...
+   >>> shape.commit()
+   >>> ....
+
+.. note::
+
+   1. Each *finish()* combines the preceding draws into one logical shape, giving it common colors, line width, morphing, etc. If *closePath* is specified, it will also connect the end point of the last draw with the starting point of the first one.
+
+   2. To successfully create compound graphics, let each draw method use the end point of the previous one as its starting point. In the above pseudo code, *draw2* should hence use the returned :ref:`Point` of *draw1* as its starting point. Failing to do so, would automatically start a new path and *finish()* may not work as expected (but it won't complain either).
+
+   3. Text insertions may occur anywhere before the commit (they neither touch :attr:`Shape.draw_cont` nor :attr:`Shape.lastPoint`). They are appended to *Shape.totalcont* directly, whereas draws will be appended by *Shape.finish*.
+
+   4. Each *commit* takes all text insertions and shapes and places them in foreground or background on the page -- thus providing a way to control graphical layers.
+
+   5. **Only** *commit* **will update** the page's contents, the other methods are basically string manipulations.
+
+Examples
+---------
+1. Create a full circle of pieces of pie in different colors::
+
+      shape = page.newShape()  # start a new shape
+      cols = (...)  # a sequence of RGB color triples
+      pieces = len(cols)  # number of pieces to draw
+      beta = 360. / pieces  # angle of each piece of pie
+      center = fitz.Point(...)  # center of the pie
+      p0 = fitz.Point(...)  # starting point
+      for i in range(pieces):
+          p0 = shape.drawSector(center, p0, beta,
+                                fullSector=True) # draw piece
+          # now fill it but do not connect ends of the arc
+          shape.finish(fill=cols[i], closePath=False)
+      shape.commit()  # update the page
+
+Here is an example for 5 colors:
+
+.. image:: images/img-cake.png
+
+2. Create a regular n-edged polygon (fill yellow, red border). We use *drawSector()* only to calculate the points on the circumference, and empty the draw command buffer again before drawing the polygon::
+
+      shape = page.newShape() # start a new shape
+      beta = -360.0 / n  # our angle, drawn clockwise
+      center = fitz.Point(...)  # center of circle
+      p0 = fitz.Point(...)  # start here (1st edge)
+      points = [p0]  # store polygon edges
+      for i in range(n):  # calculate the edges
+          p0 = shape.drawSector(center, p0, beta)
+          points.append(p0)
+      shape.draw_cont = ""  # do not draw the circle sectors
+      shape.drawPolyline(points)  # draw the polygon
+      shape.finish(color=(1,0,0), fill=(1,1,0), closePath=False)
+      shape.commit()
+
+Here is the polygon for n = 7:
+
+.. image:: images/img-7edges.png
+
+.. _CommonParms:
+
+Common Parameters
+-------------------
+
+**fontname** (*str*)
+
+  In general, there are three options:
+
+  1. Use one of the standard :ref:`Base-14-Fonts`. In this case, *fontfile* **must not** be specified and *"Helvetica"* is used if this parameter is omitted, too.
+  2. Choose a font already in use by the page. Then specify its **reference** name prefixed with a slash "/", see example below.
+  3. Specify a font file present on your system. In this case choose an arbitrary, but new name for this parameter (without "/" prefix).
+
+  If inserted text should re-use one of the page's fonts, use its reference name appearing in :meth:`getFontList` like so:
+
+  Suppose the font list has the entry *[1024, 0, 'Type1', 'CJXQIC+NimbusMonL-Bold', 'R366']*, then specify *fontname = "/R366", fontfile = None* to use font *CJXQIC+NimbusMonL-Bold*.
+
+----
+
+**fontfile** (*str*)
+
+  File path of a font existing on your computer. If you specify *fontfile*, make sure you use a *fontname* **not occurring** in the above list. This new font will be embedded in the PDF upon *doc.save()*. Similar to new images, a font file will be embedded only once. A table of MD5 codes for the binary font contents is used to ensure this.
+
+----
+
+**set_simple** (*bool*)
+
+  Fonts installed from files are installed as **Type0** fonts by default. If you want to use 1-byte characters only, set this to true. This setting cannot be reverted. Subsequent changes are ignored.
+
+----
+
+**fontsize** (*float*)
+
+  Font size of text. This also determines the line height as *fontsize * 1.2*.
+
+----
+
+**dashes** (*str*)
+
+  Causes lines to be dashed. A continuous line with no dashes is drawn with *"[]0"* or *None*. For (the rather complex) details on how to achieve dashing effects, see :ref:`AdobeManual`, page 217. Simple versions look like *"[3 4]"*, which means dashes of 3 and gaps of 4 pixels length follow each other. *"[3 3]"* and *"[3]"* do the same thing.
+
+----
+
+**color / fill** (*list, tuple*)
+
+  Line and fill colors can be specified as tuples or list of of floats from 0 to 1. These sequences must have a length of 1 (GRAY), 3 (RGB) or 4 (CMYK). For GRAY colorspace, a single float instead of the unwieldy *(float,)* tuple spec is also accepted.
+
+  To simplify color specification, method *getColor()* in *fitz.utils* may be used to get predefined RGB color triples by name. It accepts a string as the name of the color and returns the corresponding triple. The method knows over 540 color names -- see section :ref:`ColorDatabase`.
+
+----
+
+**border_width** (*float*)
+
+  Set the border width for text insertions. New in v1.14.9. Relevant only if the render mode argument is used with a value greater zero.
+
+----
+
+**render_mode** (*int*)
+
+  *New in version 1.14.9:* Integer in *range(8)* which controls the text appearance (:meth:`Shape.insertText` and :meth:`Shape.insertTextbox`). See page 398 in :ref:`AdobeManual`. New in v1.14.9. These methods now also differentiate between fill and stroke colors.
+
+  * For default 0, only the text fill color is used to paint the text. For backward compatibility, using the *color* parameter instead also works.
+  * For render mode 1, only the border of each glyph (i.e. text character) is drawn with a thickness as set in argument *border_width*. The color chosen in the *color* argument is taken for this, the *fill* parameter is ignored.
+  * For render mode 2, the glyphs are filled and stroked, using both color parameters and the specified border width. You can use this value to simulate **bold text** without using another font: choose the same value for *fill* and *color* and an appropriate value for *border_width*.
+  * For render mode 3, the glyphs are neither stroked nor filled: the text becomes invisible.
+
+  The following examples use border_width=0.3, together with a fontsize of 15. Stroke color is blue and fill color is some yellow.
+
+  .. image:: images/img-rendermode.jpg
+
+----
+
+**overlay** (*bool*)
+
+  Causes the item to appear in foreground (default) or background.
+
+----
+
+**morph** (*sequence*)
+
+  Causes "morphing" of either a shape, created by the *draw*()* methods, or the text inserted by page methods *insertTextbox()* / *insertText()*. If not *None*, it must be a pair *(fixpoint, matrix)*, where *fixpoint* is a :ref:`Point` and *matrix* is a :ref:`Matrix`. The matrix can be anything except translations, i.e. *matrix.e == matrix.f == 0* must be true. The point is used as a fixed point for the matrix operation. For example, if *matrix* is a rotation or scaling, then *fixpoint* is its center. Similarly, if *matrix* is a left-right or up-down flip, then the mirroring axis will be the vertical, respectively horizontal line going through *fixpoint*, etc.
+
+  .. note:: Several methods contain checks whether the to be inserted items will actually fit into the page (like :meth:`Shape.insertText`, or :meth:`Shape.drawRect`). For the result of a morphing operation there is however no such guaranty: this is entirely the rpogrammer's responsibility.
+
+----
+
+**lineCap (deprecated: "roundCap")** (*int*)
+
+  Controls the look of line ends. The default value 0 lets each line end at exactly the given coordinate in a sharp edge. A value of 1 adds a semi-circle to the ends, whose center is the end point and whose diameter is the line width. Value 2 adds a semi-square with an edge length of line width and a center of the line end.
+
+  *Changed in version 1.14.15*
+
+----
+
+**lineJoin** (*int*)
+
+  *New in version 1.14.15:* Controls the way how line connections look like. This may be either as a sharp edge (0), a rounded join (1), or a cut-off edge (2, "butt").
+
+----
+
+**closePath** (*bool*)
+
+  Causes the end point of a drawing to be automatically connected with the starting point (by a straight line).
diff --git a/docs/text-lister.py b/docs/text-lister.py

new file mode 100644 (file)

index 0000000..9241410
--- /dev/null
+++ b/docs/text-lister.py
@@ -0,0 +1,40 @@
+import fitz
+
+
+def flags_decomposer(flags):
+    """Make font flags human readable."""
+    l = []
+    if flags & 2 ** 0:
+        l.append("superscript")
+    if flags & 2 ** 1:
+        l.append("italic")
+    if flags & 2 ** 2:
+        l.append("serifed")
+    else:
+        l.append("sans")
+    if flags & 2 ** 3:
+        l.append("monospaced")
+    else:
+        l.append("proportional")
+    if flags & 2 ** 4:
+        l.append("bold")
+    return ", ".join(l)
+
+
+doc = fitz.open("text-tester.pdf")
+page = doc[0]
+
+# read page text as a dictionary, suppressing extra spaces in CJK fonts
+blocks = page.getText("dict", flags=11)["blocks"]
+for b in blocks:  # iterate through the text blocks
+    for l in b["lines"]:  # iterate through the text lines
+        for s in l["spans"]:  # iterate through the text spans
+            print("")
+            font_properties = "Font: '%s' (%s), size %g, color #%06x" % (
+                s["font"],  # font name
+                flags_decomposer(s["flags"]),  # readable font flags
+                s["size"],  # font size
+                s["color"],  # font color
+            )
+            print("Text: '%s'" % s["text"])  # simple print of text
+            print(font_properties)
diff --git a/docs/textpage.rst b/docs/textpage.rst

new file mode 100644 (file)

index 0000000..1c0b886
--- /dev/null
+++ b/docs/textpage.rst
@@ -0,0 +1,255 @@
+.. _TextPage:
+
+================
+TextPage
+================
+
+This class represents text and images shown on a document page. All MuPDF document types are supported.
+
+The usual ways to create a textpage are :meth:`DisplayList.getTextPage` and :meth:`Page.getTextPage`. Because there is a limited set of methods in this class, there exist wrappers in the :ref:`Page` class, which incorporate creating an intermediate text page and then invoke one of the following methods. The last column of this table shows these corresponding :ref:`Page` methods.
+
+For a description of what this class is all about, see Appendix 2.
+
+======================== ================================ =============================
+**Method**               **Description**                  page getText or search method
+======================== ================================ =============================
+:meth:`~.extractText`    extract plain text               "text"
+:meth:`~.extractTEXT`    synonym of previous              "text"
+:meth:`~.extractBLOCKS`  plain text grouped in blocks     "blocks"
+:meth:`~.extractWORDS`   all words with their bbox        "words"
+:meth:`~.extractHTML`    page content in HTML format      "html"
+:meth:`~.extractJSON`    page content in JSON format      "json"
+:meth:`~.extractXHTML`   page content in XHTML format     "xhtml"
+:meth:`~.extractXML`     page text in XML format          "xml"
+:meth:`~.extractDICT`    page content in *dict* format    "dict"
+:meth:`~.extractRAWDICT` page content in *dict* format    "rawdict"
+:meth:`~.search`         Search for a string in the page  searchFor()
+======================== ================================ =============================
+
+**Class API**
+
+.. class:: TextPage
+
+   .. method:: extractText
+
+   .. method:: extractTEXT
+
+      Return a string of the page's complete text. The text is UTF-8 unicode and in the same sequence as specified at the time of document creation.
+
+      :rtype: str
+
+   .. method:: extractBLOCKS
+
+      Textpage content as a list of text lines grouped by block. Each list items looks like this::
+
+         (x0, y0, x1, y1, "lines in blocks", block_type, block_no)
+
+      The first four entries are the block's bbox coordinates, *block_type* is 1 for an image block, 0 for text. *block_no* is the block sequence number.
+
+      For an image block, its bbox and a text line with image meta information is included -- not the image data itself.
+
+      This is a high-speed method with enough information to rebuild a desired text sequence.
+
+      :rtype: list
+
+   .. method:: extractWORDS
+
+      Textpage content as a list of single words with bbox information. An item of this list looks like this::
+
+         (x0, y0, x1, y1, "word", block_no, line_no, word_no)
+
+      Everything wrapped in spaces is treated as a *"word"* with this method.
+
+      This is a high-speed method which e.g. allows extracting text from within a given rectangle.
+
+      :rtype: list
+
+   .. method:: extractHTML
+
+      Textpage content in HTML format. This version contains complete formatting and positioning information. Images are included (encoded as base64 strings). You need an HTML package to interpret the output in Python. Your internet browser should be able to adequately display this information, but see :ref:`HTMLQuality`.
+
+      :rtype: str
+
+   .. method:: extractDICT
+
+      Textpage content as a Python dictionary. Provides same information detail as HTML. See below for the structure.
+
+      :rtype: dict
+
+   .. method:: extractJSON
+
+      Textpage content in JSON format. Created by  *json.dumps(TextPage.extractDICT())*. It is included for backlevel compatibility. You will probably use this method ever only for outputting the result in some file. The  method detects binary image data, like *bytearray* and *bytes* (Python 3 only) and converts them to base64 encoded strings on JSON output.
+
+      :rtype: str
+
+   .. method:: extractXHTML
+
+      Textpage content in XHTML format. Text information detail is comparable with :meth:`extractTEXT`, but also contains images (base64 encoded). This method makes no attempt to re-create the original visual appearance.
+
+      :rtype: str
+
+   .. method:: extractXML
+
+      Textpage content in XML format. This contains complete formatting information about every single character on the page: font, size, line, paragraph, location, color, etc. Contains no images. You probably need an XML package to interpret the output in Python.
+
+      :rtype: str
+
+   .. method:: extractRAWDICT
+
+      Textpage content as a Python dictionary -- technically similar to :meth:`extractDICT`, and it contains that information as a subset (including any images). It provides additional detail down to each character, which makes using XML obsolete in many cases. See below for the structure.
+
+      :rtype: dict
+
+   .. method:: search(string, hit_max = 16, quads = False)
+
+      Search for *string* and return a list of found locations.
+
+      :arg str string: the string to search for. Upper / lower cases will all match.
+      :arg int hit_max: maximum number of returned hits (default 16).
+      :arg bool quads: return quadrilaterals instead of rectangles.
+      :rtype: list
+      :returns: a list of :ref:`Rect` or :ref:`Quad` objects, each surrounding a found *string* occurrence. The search string may contain spaces, it may therefore happen, that its parts are located on different lines. In this case, more than one rectangle (resp. quadrilateral) are returned. The method does **not support hyphenation**, so it will not find "meth-od" when searching for "method".
+
+      Example: If the search for string "pymupdf" contains a hit like shown, then the corresponding entry will either be the blue rectangle, or, if *quads* was specified, *Quad(ul, ur, ll, lr)*.
+
+      .. image:: images/img-quads.jpg
+
+.. _textpagedict:
+
+Dictionary Structure of :meth:`extractDICT` and :meth:`extractRAWDICT`
+-------------------------------------------------------------------------
+
+.. image:: images/img-textpage.png
+   :scale: 66
+
+Page Dictionary
+~~~~~~~~~~~~~~~~~
+=============== ============================================
+**Key**         **Value**
+=============== ============================================
+width           page width in pixels *(float)*
+height          page height in pixels *(float)*
+blocks          *list* of block dictionaries
+=============== ============================================
+
+Block Dictionaries
+~~~~~~~~~~~~~~~~~~
+Blocks come in two different formats: **image blocks** and **text blocks**.
+
+**Image block:**
+
+=============== ===============================================================
+**Key**             **Value**
+=============== ===============================================================
+type            1 = image *(int)*
+bbox            block / image rectangle, formatted as *tuple(fitz.Rect)*
+ext             image type *(str)*, as file extension, see below
+width           original image width *(int)*
+height          original image height *(int)*
+colorspace      colorspace.n *(int)*
+xres            resolution in x-direction *(int)*
+yres            resolution in y-direction *(int)*
+bpc             bits per component *(int)*
+image           image content *(bytes or bytearray)*
+=============== ===============================================================
+
+Possible values of key "ext" are "bmp", "gif", "jpeg", "jpx" (JPEG 2000), "jxr" (JPEG XR), "png", "pnm", and "tiff".
+
+.. note::
+
+   1. In some error situations, all of the above values may be zero or empty. So, please be prepared to digest items like::
+
+      {"type": 1, "bbox": (0.0, 0.0, 0.0, 0.0), ..., "image": b""}
+
+
+   2. :ref:`TextPage` and corresponding method :meth:`Page.getText` are **available for all document types**. Only for PDF documents, methods :meth:`Document.getPageImageList` / :meth:`Page.getImageList` offer some overlapping functionality as far as image lists are concerned. But both lists **may or may not** contain the same items. Any differences are most probably caused by one of the following:
+
+       - "Inline" images (see page 352 of the :ref:`AdobeManual`) of a PDF page are contained in a textpage, but **not in** :meth:`Page.getImageList`.
+       - Image blocks in a textpage are generated for **every** image location -- whether or not there are any duplicates. This is in contrast to :meth:`Page.getImageList`, which will contain each image only once.
+       - Images mentioned in the page's :data:`object` definition will **always** appear in :meth:`Page.getImageList` [#f1]_. But it may happen, that there is no "display" command in the page's :data:`contents` (erroneously or on purpose). In this case the image will **not appear** in the textpage.
+
+
+**Text block:**
+
+=============== ====================================================
+**Key**             **Value**
+=============== ====================================================
+type            0 = text *(int)*
+bbox            block rectangle, formatted as *tuple(fitz.Rect)*
+lines           *list* of text line dictionaries
+=============== ====================================================
+
+Line Dictionary
+~~~~~~~~~~~~~~~~~
+
+=============== =====================================================
+**Key**             **Value**
+=============== =====================================================
+bbox            line rectangle, formatted as *tuple(fitz.Rect)*
+wmode           writing mode *(int)*: 0 = horizontal, 1 = vertical
+dir             writing direction *(list of floats)*: *[x, y]*
+spans           *list* of span dictionaries
+=============== =====================================================
+
+The value of key *"dir"* is a **unit vetor** and should be interpreted as follows:
+
+* *x*: positive = "left-right", negative = "right-left", 0 = neither
+* *y*: positive = "top-bottom", negative = "bottom-top", 0 = neither
+
+The values indicate the "relative writing speed" in each direction, such that x\ :sup:`2` + y\ :sup:`2` = 1. In other words *dir = [cos(beta), sin(beta)]*, where *beta* is the writing angle relative to the horizontal.
+
+Span Dictionary
+~~~~~~~~~~~~~~~~~
+
+Spans contain the actual text. A line contains **more than one span only**, if it contains text with different font properties.
+
+*(Changed in version 1.14.17)* Spans now also have a *bbox* key (again).
+
+=============== =====================================================================
+**Key**             **Value**
+=============== =====================================================================
+bbox            span rectangle, formatted as *tuple(fitz.Rect)*
+font            font name *(str)*
+size            font size *(float)*
+flags           font characteristics *(int)*
+color           text color in sRGB format *(int)*
+text            (only for :meth:`extractDICT`) text *(str)*
+chars           (only for :meth:`extractRAWDICT`) *list* of character dictionaries
+=============== =====================================================================
+
+*(New in version 1.16.0)*
+
+*"color"* is the text color encoded in sRGB format, e.g. 0xFF0000 for red.
+
+*"flags"* is an integer, encoding bools of font properties:
+
+* bit 0: superscripted (2\ :sup:`0`)
+* bit 1: italic (2\ :sup:`1`)
+* bit 2: serifed (2\ :sup:`2`)
+* bit 3: monospaced (2\ :sup:`3`)
+* bit 4: bold (2\ :sup:`4`)
+
+Test these characteristics like so:
+
+>>> if flags & 2**1: print("italic")
+>>> # etc.
+
+Character Dictionary for :meth:`extractRAWDICT`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We are currently providing the bbox in :data:`rect_like` format. In a future version, we might change that to :data:`quad_like`. This image shows the relationship between items in the following table: |textpagechar|
+
+.. |textpagechar| image:: images/img-textpage-char.png
+   :align: top
+   :scale: 66
+
+=============== =========================================================
+**Key**             **Value**
+=============== =========================================================
+origin          *tuple* coordinates of the character's bottom left point
+bbox            character rectangle, formatted as *tuple(fitz.Rect)*
+c               the character (unicode)
+=============== =========================================================
+
+.. rubric:: Footnotes
+
+.. [#f1] Image specifications for a PDF page are done in the page's sub-dictionary */Resources*. Being a text format specification, PDF does not prevent one from having arbitrary image entries in this dictionary -- whether actually in use by the page or not. On top of this, resource dictionaries can be **inherited** from the page's parent object -- like a node of the PDF's :data:`pagetree` or the :data:`catalog` object. So the PDF creator may e.g. define one file level */Resources* naming all images and fonts ever used by any page. In this case, :meth:`Page.getImageList` and :meth:`Page.getFontList` will always return the same lists for all pages.
diff --git a/docs/textwriter.rst b/docs/textwriter.rst

new file mode 100644 (file)

index 0000000..dc62f68
--- /dev/null
+++ b/docs/textwriter.rst
@@ -0,0 +1,106 @@
+.. _TextWriter:
+
+================
+TextWriter
+================
+
+*(New in v1.16.18)* This class represents a MuPDF *text* object. It can be thought of as a collection of text *"spans"*. Each span has its own starting position, font and font size. It is an elegant alternative for writing text to PDF pages, when compared with methods :meth:`Page.insertText` and friends:
+
+* **Improved text positioning:** Choose any point where insertion of a text span should start. Storing a text span returns the coordinates of the *last character* of the span.
+* **Free font choice:** Each text span has its own font and fontsize. This lets you easily switch between font and font characteristics when composing a larger text.
+* **Automatic fallback fonts:** If a character is not represented by the chosen font, alternative fonts are automatically searched. This significantly reduces the risk of seeing unprintable symbols in the output ("TOFUs"). PyMuPDF now also comes with the **universal font "Droid Sans Fallback Regular"**, which supports **all Latin** characters (incuding Cyrillic and Greek), and **all CJK** characters (Chinese, Japanese, Korean).
+* **Cyrillic and Greek Support:** The :ref:`Base-14-fonts` have integrated support of Cyrillic and Greek characters **without specifying encoding.** If your text is a mixture of Latin, Greek and Cyrillic, it will be shown correctly if you just use e.g. font "Helvetica".
+* **Transparency support:** Parameter *opacity* is supported. This offers a handy way to create watermark-style text.
+* **Justified text:** Supported for any font -- not just simple fonts as in :meth:`Page.insertText`.
+* **Reusability:** A TextWriter object exists independent from any page. It can be written multiple times, either to the same or to other pages, in the same or in different PDFs, choosing different colors or transparency.
+
+Using this object entails three steps:
+
+1. When **created**, a TextWriter requires a fixed **page rectangle** in relation to which it calculates text span positions. Text can be written to a page if and only if its size equals that of the TextWriter.
+2. Store text in the TextWriter using methods :meth:`TextWriter.append` and :meth:`TextWriter.fillTextbox` as often as desired.
+3. Output the TextWriter object on some PDF page with a compatible size.
+
+.. note:: Starting with version 1.17.0, TextWriters **do support** text rotation via the *morph* parameter of :meth:`TextWriter.writeText`.
+
+There also exists :meth:`Page.writeText` which lets you combine one or more TextWriters and jointly write them to a given rectangle and with a given rotation angle -- much like :meth:`Page.showPDFpage`.
+
+**Class API**
+
+.. class:: TextWriter
+
+   .. method:: __init__(self, rect, opacity=1, color=None)
+
+      :arg rect-like rect: rectangle internally used for text positioning computations.
+      :arg float opacity: sets the transparency for the text to store here. Values outside the interval ``[0, 1)`` will be ignored. A value of e.g. 0.5 means 50% transparency.
+      :arg float,sequ color: the color of the text. All colors are specified as floats *0 <= color <= 1*. A single float represents some gray level, a sequence implies the colorspace via its length.
+
+
+   .. method:: append(pos, text, font=None, fontsize=11, language=None)
+
+      Add new text, usually (but not necessarily) representing a text span.
+
+      :arg point_like pos: start position of the text, the bottom left point of the first character.
+      :arg str text: a string (Python 2: unicode is mandatory!) of arbitrary length. It will be written starting at position "pos".
+      :arg font: a :ref:`Font`. If omitted, ``fitz.Font("helv")`` will be used.
+      :arg float fontsize: the fontsize, a positive number, default 11.
+      :arg str language: the language to use, e.g. "en" for English. Meaningful values should be compliant with the ISO 639 standards 1, 2, 3 or 5. Reserved for future use: currently has no effect as far as we know.
+
+      :returns: :attr:`textRect` and :attr:`lastPoint`.
+
+   .. method:: fillTextbox(rect, text, pos=None, font=None, fontsize=11, align=0, warn=True)
+
+      Fill a given rectangle with text. This is a convenience method to use as an alternative to :meth:`append`.
+
+      :arg rect_like rect: the area to fill. No part of the text will appear outside of this.
+      :arg str,sequ text: the text. Can be specified as a (UTF-8) string or a list / tuple of strings. A string will first be converted to a list using *splitlines()*. Every list item will begin on a new line (forced line breaks).
+      :arg point_like pos: *(new in v1.17.3)* start storing at this point. Default is a point near rectangle top-left.
+      :arg font: the :ref:`Font`, default `fitz.Font("helv")`.
+      :arg float fontsize: the fontsize.
+      :arg int align: text alignment. Use one of TEXT_ALIGN_LEFT, TEXT_ALIGN_CENTER, TEXT_ALIGN_RIGHT or TEXT_ALIGN_JUSTIFY.
+      :arg bool warn: warn on text overflow (default), or raise an exception. In any case, text not fitting will not be written.
+
+   .. note:: Use these methods as often as is required -- there is no technical limit (except memory constraints of your system). You can also mix appends and text boxes and have multiple of both. Text positioning is controlled by the insertion point. There is no need to adhere to any order.
+
+
+   .. method:: writeText(page, opacity=None, color=None, morph=None, overlay=True)
+
+      Write the TextWriter text to a page.
+
+      :arg page: write to this :ref:`Page`.
+      :arg float opacity: override the value of the TextWriter for this output.
+      :arg sequ color: override the value of the TextWriter for this output.
+      :arg sequ morph: modify the text appearance by applying a matrix to it. If provided, this must be a sequence *(fixpoint, matrix)* with a point-like *fixpoint* and a matrix-like *matrix*. A typical example is rotating the text around *fixpoint*. 
+      :arg bool overlay: put in foreground (default) or background.
+
+
+   .. attribute:: textRect
+
+      The :ref:`Rect` currently occupied. This value changes when more text is added.
+
+   .. attribute:: lastPoint
+
+      The "cursor position" -- a :ref:`Point` -- after the last written character (its bottom-right).
+
+   .. attribute:: opacity
+
+      The text opacity (modifyable).
+
+   .. attribute:: color
+
+      The text color (modifyable).
+
+   .. attribute:: rect
+
+      The page rectangle for which this TextWriter was created. Must not be modified.
+
+
+To see some demo scripts dealing with TextWriter, have a look at `this <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/textwriter>`_ repository.
+
+
+.. note::
+
+  1. Opacity and color apply to **all the text** in this object. 
+  2. If you need different colors / transpareny, you must create a separate TextWriter. Whenever you determine the color should change, simply append the text to the respective TextWriter using the previously returned :attr:`lastPoint` as position for the new text span.
+  3. Appending items or text boxes can occur in arbitrary order: only the position parameter controls where text appears.
+  4. Font and fontsize can freely vary within the same TextWriter. This can be used to let text with different properties appear on the same displayed line: just specify *pos* accordingly, and e.g. set it to :attr:`lastPoint` of the previously added item.
+  5. You can use the *pos* argument of :meth:`TextWriter.fillTextbox` to indent the first line, so its text may continue any preceeding one in a continuous manner.
diff --git a/docs/tools.rst b/docs/tools.rst

new file mode 100644 (file)

index 0000000..eaf4fdf
--- /dev/null
+++ b/docs/tools.rst
@@ -0,0 +1,248 @@
+.. _Tools:
+
+Tools
+================
+
+This class is a collection of utility methods and attributes, mainly around memory management. To simplify and speed up its use, it is automatically instantiated under the name *TOOLS* when PyMuPDF is imported.
+
+================================== =================================================
+**Method / Attribute**             **Description**
+================================== =================================================
+:meth:`Tools.gen_id`               generate a unique identifyer
+:meth:`Tools.image_profile`        report basic image properties
+:meth:`Tools.store_shrink`         shrink the storables cache [#f1]_
+:meth:`Tools.mupdf_warnings`       return the accumulated MuPDF warnings
+:meth:`Tools.mupdf_display_errors` return the accumulated MuPDF warnings
+:meth:`Tools.reset_mupdf_warnings` empty MuPDF messages on STDOUT
+:meth:`Tools.set_aa_level`         set the anti-aliasing values
+:meth:`Tools.show_aa_level`        return the anti-aliasing values
+:attr:`Tools.fitz_config`          configuration settings of PyMuPDF
+:attr:`Tools.store_maxsize`        maximum storables cache size
+:attr:`Tools.store_size`           current storables cache size
+================================== =================================================
+
+**Class API**
+
+.. class:: Tools
+
+   .. method:: gen_id()
+
+      A convenience method returning a unique positive integer which will increase by 1 on every invocation. Example usages include creating unique keys in databases - its creation should be faster than using timestamps by an order of magnitude.
+
+      .. note:: MuPDF has dropped support for this in v1.14.0, so we have re-implemented a similar function with the following differences:
+
+            * It is not part of MuPDF's global context and not threadsafe (not an issue because we do not support threads in PyMuPDF anyway).
+            * It is implemented as *int*. This means that the maximum number is *sys.maxsize*. Should this number ever be exceeded, the counter starts over again at 1.
+
+      :rtype: int
+      :returns: a unique positive integer.
+
+   .. method:: image_profile(stream)
+
+      *(New in v1.16.17)* Show important properties of an image provided as a memory area. Its main purpose is to avoid using other Python packages just to determine basic properties.
+
+      :arg bytes,bytearray stream: the image data.
+      :rtype: dict
+      :returns: a dictionary with the keys "width", "height", "xres", "yres", "colorspace" (the *colorspace.n* value, number of colorants), "cs-name" (the *colorspace.name* value), "bpc", "ext" (image type as file extension). The values for these keys are the same as returned by :meth:`Document.extractImage`. Please also have a look at :data:`resolution`.
+      
+      .. note::
+
+        * For some "exotic" images (FAX encodings, RAW formats and the like), this method will not work and return *None*. You can however still work with such images in PyMuPDF, e.g. by using :meth:`Document.extractImage` or create pixmaps via ``Pixmap(doc, xref)``. These methods will automatically convert exotic images to the PNG format before returning results.
+
+        * Some examples::
+
+               In [1]: import fitz
+               In [2]: stream = open(<image.file>, "rb").read()
+               In [3]: fitz.TOOLS.image_profile(stream)
+               Out[3]:
+               {'width': 439,
+               'height': 501,
+               'xres': 96,
+               'yres': 96,
+               'colorspace': 3,
+               'bpc': 8,
+               'ext': 'jpeg',
+               'cs-name': 'DeviceRGB'}
+               In [4]: doc=fitz.open(<input.pdf>)
+               In [5]: stream = doc.xrefStreamRaw(5)  # no decompression!
+               In [6]: fitz.TOOLS.image_profile(stream)
+               Out[6]:
+               {'width': 816,
+               'height': 1056,
+               'xres': 96,
+               'yres': 96,
+               'colorspace': 1,
+               'bpc': 8,
+               'ext': 'jpeg',
+               'cs-name': 'DeviceGray'}
+
+   .. method:: store_shrink(percent)
+
+      Reduce the storables cache by a percentage of its current size.
+
+      :arg int percent: the percentage of current size to free. If 100+ the store will be emptied, if zero, nothing will happen. MuPDF's caching strategy is "least recently used", so low-usage elements get deleted first.
+
+      :rtype: int
+      :returns: the new current store size. Depending on the situation, the size reduction may be larger than the requested percentage.
+
+   .. method:: show_aa_level()
+
+      *(New in version 1.16.14)* Return the current anti-aliasing values. These values control the rendering quality of graphics and text elements.
+
+      :rtype: dict
+      :returns: A dictionary with the following initial content: ``{'graphics': 8, 'text': 8, 'graphics_min_line_width': 0.0}``.
+
+
+   .. method:: set_aa_level(level)
+
+      *(New in version 1.16.14)* Set the new number of bits to use for anti-aliasing. The same value is taken currently for graphics and text rendering. This might change in a future MuPDF release.
+
+      :arg int level: an integer ranging between 0 and 8. Value outside this range will be silently changed to valid values. The value will remain in effect throughout the current session or until changed again.
+
+
+   .. method:: reset_mupdf_warnings()
+
+      *(New in version 1.16.0)*
+      
+      Empty MuPDF warnings message buffer.
+
+
+   .. method:: mupdf_display_errors(value=None)
+
+      *(New in version 1.16.8)*
+      
+      Show or set whether MuPDF errors should be displayed.
+
+      :arg bool value: if not a bool, the current setting is returned. If true, MuPDF errors will be shown on *sys.stderr*, otherwise suppressed. In any case, messages continue to be stored in the warnings store. Upon import of PyMuPDF this value is *True*.
+
+      :returns: *True* or *False*
+
+
+   .. method:: mupdf_warnings(reset=True)
+
+      *(New in version 1.16.0)*
+      
+      Return all stored MuPDF messages as a string with interspersed line-breaks.
+
+      :arg bool reset: *(new in version 1.16.7)* whether to automatically empty the store.
+
+
+   .. attribute:: fitz_config
+
+      A dictionary containing the actual values used for configuring PyMuPDF and MuPDF. Also refer to the installation chapter. This is an overview of the keys, each of which describes the status of a support aspect.
+
+      ================= ===================================================
+      **Key**           **Support included for ...**
+      ================= ===================================================
+      plotter-g         Gray colorspace rendering
+      plotter-rgb       RGB colorspace rendering
+      plotter-cmyk      CMYK colorspcae rendering
+      plotter-n         overprint rendering
+      pdf               PDF documents
+      xps               XPS documents
+      svg               SVG documents
+      cbz               CBZ documents
+      img               IMG documents
+      html              HTML documents
+      epub              EPUB documents
+      jpx               JPEG2000 images
+      js                JavaScript
+      tofu              all TOFU fonts
+      tofu-cjk          CJK font subset (China, Japan, Korea)
+      tofu-cjk-ext      CJK font extensions
+      tofu-cjk-lang     CJK font language extensions
+      tofu-emoji        TOFU emoji fonts
+      tofu-historic     TOFU historic fonts
+      tofu-symbol       TOFU symbol fonts
+      tofu-sil          TOFU SIL fonts
+      icc               ICC profiles
+      py-memory         using Python memory management [#f2]_
+      base14            Base-14 fonts (should always be true)
+      ================= ===================================================
+
+      For an explanation of the term "TOFU" see `this Wikipedia article <https://en.wikipedia.org/wiki/Noto_fonts>`_.::
+
+       In [1]: import fitz
+       In [2]: TOOLS.fitz_config
+       Out[2]:
+       {'plotter-g': True,
+        'plotter-rgb': True,
+        'plotter-cmyk': True,
+        'plotter-n': True,
+        'pdf': True,
+        'xps': True,
+        'svg': True,
+        'cbz': True,
+        'img': True,
+        'html': True,
+        'epub': True,
+        'jpx': True,
+        'js': True,
+        'tofu': False,
+        'tofu-cjk': True,
+        'tofu-cjk-ext': False,
+        'tofu-cjk-lang': False,
+        'tofu-emoji': False,
+        'tofu-historic': False,
+        'tofu-symbol': False,
+        'tofu-sil': False,
+        'icc': True,
+        'py-memory': True, # (False if Python 2)
+        'base14': True}
+
+      :rtype: dict
+
+   .. attribute:: store_maxsize
+
+      Maximum storables cache size in bytes. PyMuPDF is generated with a value of 268'435'456 (256 MB, the default value), which you should therefore always see here. If this value is zero, then an "unlimited" growth is permitted.
+
+      :rtype: int
+
+   .. attribute:: store_size
+
+      Current storables cache size in bytes. This value may change (and will usually increase) with every use of a PyMuPDF function. It will (automatically) decrease only when :attr:`Tools.store_maxize` is going to be exceeded: in this case, MuPDF will evict low-usage objects until the value is again in range.
+
+      :rtype: int
+
+Example Session
+----------------
+
+.. highlight:: python
+
+::
+   >>> import fitz
+   # print the maximum and current cache sizes
+   >>> fitz.TOOLS.store_maxsize
+   268435456
+   >>> fitz.TOOLS.store_size
+   0
+   >>> doc = fitz.open("demo1.pdf")
+   # pixmap creation puts lots of object in cache (text, images, fonts),
+   # apart from the pixmap itself
+   >>> pix = doc[0].getPixmap(alpha=False)
+   >>> fitz.TOOLS.store_size
+   454519
+   # release (at least) 50% of the storage
+   >>> fitz.TOOLS.store_shrink(50)
+   13471
+   >>> fitz.TOOLS.store_size
+   13471
+   # get a few unique numbers
+   >>> fitz.TOOLS.gen_id()
+   1
+   >>> fitz.TOOLS.gen_id()
+   2
+   >>> fitz.TOOLS.gen_id()
+   3
+   # close document and see how much cache is still in use
+   >>> doc.close()
+   >>> fitz.TOOLS.store_size
+   0
+   >>>
+
+
+.. rubric:: Footnotes
+
+.. [#f1] This memory area is internally used by MuPDF, and it serves as a cache for objects that have already been read and interpreted, thus improving performance. The most bulky object types are images and also fonts. When an application starts up the MuPDF library (in our case this happens as part of *import fitz*), it must specify a maximum size for this area. PyMuPDF's uses the default value (256 MB) to limit memory consumption. Use the methods here to control or investigate store usage. For example: even after a document has been closed and all related objects have been deleted, the store usage may still not drop down to zero. So you might want to enforce that before opening another document.
+
+.. [#f2] Optionally, all dynamic management of memory can be done using Python C-level calls. MuPDF offers a hook to insert user-preferred memory managers. We are using option this for Python version 3 since PyMuPDF v1.13.19. At the same time, all memory allocation in PyMuPDF itself is also routed to Python (i.e. no more direct *malloc()* calls in the code). We have seen improved memory usage and slightly reduced runtimes with this option set. If you want to change this, you can set *#define JM_MEMORY 0* (uses standard C malloc, or 1 for Python allocation )in file *fitz.i* and then generate PyMuPDF.
diff --git a/docs/tutorial.rst b/docs/tutorial.rst

new file mode 100644 (file)

index 0000000..9dce148
--- /dev/null
+++ b/docs/tutorial.rst
@@ -0,0 +1,351 @@
+.. _Tutorial:
+
+=========
+Tutorial
+=========
+
+.. highlight:: python
+
+This tutorial will show you the use of PyMuPDF, MuPDF in Python, step by step.
+
+Because MuPDF supports not only PDF, but also XPS, OpenXPS, CBZ, CBR, FB2 and EPUB formats, so does PyMuPDF [#f1]_. Nevertheless, for the sake of brevity we will only talk about PDF files. At places where indeed only PDF files are supported, this will be mentioned explicitely.
+
+Importing the Bindings
+==========================
+The Python bindings to MuPDF are made available by this import statement. We also show here how your version can be checked::
+
+    >>> import fitz
+    >>> print(fitz.__doc__)
+    PyMuPDF 1.16.0: Python bindings for the MuPDF 1.16.0 library.
+    Version date: 2019-07-28 07:30:14.
+    Built for Python 3.7 on win32 (64-bit).
+
+
+Opening a Document
+======================
+To access a supported document, it must be opened with the following statement::
+
+    doc = fitz.open(filename)     # or fitz.Document(filename)
+
+This creates the :ref:`Document` object *doc*. *filename* must be a Python string specifying the name of an existing file.
+
+It is also possible to open a document from memory data, or to create a new, empty PDF. See :ref:`Document` for details.
+
+A document contains many attributes and functions. Among them are meta information (like "author" or "subject"), number of total pages, outline and encryption information.
+
+Some :ref:`Document` Methods and Attributes
+=============================================
+
+=========================== ==========================================
+**Method / Attribute**      **Description**
+=========================== ==========================================
+:attr:`Document.pageCount`  the number of pages (*int*)
+:attr:`Document.metadata`   the metadata (*dict*)
+:meth:`Document.getToC`     get the table of contents (*list*)
+:meth:`Document.loadPage`   read a :ref:`Page`
+=========================== ==========================================
+
+Accessing Meta Data
+========================
+PyMuPDF fully supports standard metadata. :attr:`Document.metadata` is a Python dictionary with the following keys. It is available for **all document types**, though not all entries may always contain data. For details of their meanings and formats consult the respective manuals, e.g. :ref:`AdobeManual` for PDF. Further information can also be found in chapter :ref:`Document`. The meta data fields are strings or *None* if not otherwise indicated. Also be aware that not all of them always contain meaningful data -- even if they are not *None*.
+
+============== =================================
+**Key**        **Value**
+============== =================================
+producer       producer (producing software)
+format         format: 'PDF-1.4', 'EPUB', etc.
+encryption     encryption method used if any
+author         author
+modDate        date of last modification
+keywords       keywords
+title          title
+creationDate   date of creation
+creator        creating application
+subject        subject
+============== =================================
+
+.. note:: Apart from these standard metadata, **PDF documents** starting from PDF version 1.4 may also contain so-called *"metadata streams"*. Information in such streams is coded in XML. PyMuPDF deliberately contains no XML components, so we do not directly support access to information contained therein. But you can extract the stream as a whole, inspect or modify it using a package like `lxml <https://pypi.org/project/lxml/>`_ and then store the result back into the PDF. If you want, you can also delete these data altogether.
+
+.. note:: There are two utility scripts in the repository that `import (PDF only) <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/csv2meta.py>`_ resp. `export <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/meta2csv.py>`_ metadata from resp. to CSV files.
+
+Working with Outlines
+=========================
+The easiest way to get all outlines (also called "bookmarks") of a document, is by loading its *table of contents*::
+
+    toc = doc.getToC()
+
+This will return a Python list of lists *[[lvl, title, page, ...], ...]* which looks much like a conventional table of contents found in books.
+
+*lvl* is the hierarchy level of the entry (starting from 1), *title* is the entry's title, and *page* the page number (1-based!). Other parameters describe details of the bookmark target.
+
+.. note:: There are two utility scripts in the repository that `import (PDF only) <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/csv2toc.py>`_ resp. `export <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/toc2csv.py>`_ table of contents from resp. to CSV files.
+
+Working with Pages
+======================
+:ref:`Page` handling is at the core of MuPDF's functionality.
+
+* You can render a page into a raster or vector (SVG) image, optionally zooming, rotating, shifting or shearing it.
+* You can extract a page's text and images in many formats and search for text strings.
+* For PDF documents many more methods are available to add text or images to pages.
+
+First, a :ref:`Page` must be created. This is a method of :ref:`Document`::
+
+    page = doc.loadPage(pno)  # loads page number 'pno' of the document (0-based)
+    page = doc[pno]  # the short form
+
+Any integer *-inf < pno < pageCount* is possible here. Negative numbers count backwards from the end, so *doc[-1]* is the last page, like with Python sequences.
+
+Some more advanced way would be using the document as an **iterator** over its pages::
+
+    for page in doc:
+        # do something with 'page'
+
+    # ... or read backwards
+    for page in reversed(doc):
+        # do something with 'page'
+
+    # ... or even use 'slicing'
+    for page in doc.pages(start, stop, step):
+        # do something with 'page'
+
+
+Once you have your page, here is what you would typically do with it:
+
+Inspecting the Links, Annotations or Form Fields of a Page
+-----------------------------------------------------------
+Links are shown as "hot areas" when a document is displayed with some viewer software. If you click while your cursor shows a hand symbol, you will usually be taken to the taget that is encoded in that hot area. Here is how to get all links::
+
+    # get all links on a page
+    links = page.getLinks()
+
+*links* is a Python list of dictionaries. For details see :meth:`Page.getLinks`.
+
+You can also use an iterator which emits one link at a time::
+
+    for link in page.links():
+        # do something with 'link'
+
+If dealing with a PDF document page, there may also exist annotations (:ref:`Annot`) or form fields (:ref:`Widget`), each of which have their own iterators::
+
+    for annot in page.annots():
+        # do something with 'annot'
+
+    for field in page.widgets():
+        # do something with 'field'
+
+
+Rendering a Page
+-----------------------
+This example creates a **raster** image of a page's content::
+
+    pix = page.getPixmap()
+
+*pix* is a :ref:`Pixmap` object which (in this case) contains an **RGB** image of the page, ready to be used for many purposes. Method :meth:`Page.getPixmap` offers lots of variations for controlling the image: resolution, colorspace (e.g. to produce a grayscale image or an image with a subtractive color scheme), transparency, rotation, mirroring, shifting, shearing, etc. For example: to create an **RGBA** image (i.e. containing an alpha channel), specify *pix = page.getPixmap(alpha=True)*.
+
+A :ref:`Pixmap` contains a number of methods and attributes which are referenced below. Among them are the integers *width*, *height* (each in pixels) and *stride* (number of bytes of one horizontal image line). Attribute *samples* represents a rectangular area of bytes representing the image data (a Python *bytes* object).
+
+.. note:: You can also create a **vector** image of a page by using :meth:`Page.getSVGimage`. Refer to this `Wiki <https://github.com/pymupdf/PyMuPDF/wiki/Vector-Image-Support>`_ for details.
+
+Saving the Page Image in a File
+-----------------------------------
+We can simply store the image in a PNG file::
+
+    pix.writeImage("page-%i.png" % page.number)
+
+Displaying the Image in GUIs
+-------------------------------------------
+We can also use it in GUI dialog managers. :attr:`Pixmap.samples` represents an area of bytes of all the pixels as a Python bytes object. Here are some examples, find more in the `examples <https://github.com/pymupdf/PyMuPDF/tree/master/examples>`_ directory.
+
+wxPython
+~~~~~~~~~~~~~
+Consult their documentation for adjustments to RGB(A) pixmaps and, potentially, specifics for your wxPython release::
+
+    if pix.alpha:
+        bitmap = wx.Bitmap.FromBufferRGBA(pix.width, pix.height, pix.samples)
+    else:
+        bitmap = wx.Bitmap.FromBuffer(pix.width, pix.height, pix.samples)
+
+Tkinter
+~~~~~~~~~~
+Please also see section 3.19 of the `Pillow documentation <https://Pillow.readthedocs.io>`_::
+
+    from PIL import Image, ImageTk
+
+    # set the mode depending on alpha
+    mode = "RGBA" if pix.alpha else "RGB"
+    img = Image.frombytes(mode, [pix.width, pix.height], pix.samples)
+    tkimg = ImageTk.PhotoImage(img)
+
+The following **avoids using Pillow**::
+
+    # remove alpha if present
+    pix1 = fitz.Pixmap(pix, 0) if pix.alpha else pix  # PPM does not support transparency
+    imgdata = pix1.getImageData("ppm")  # extremely fast!
+    tkimg = tkinter.PhotoImage(data = imgdata)
+
+If you are looking for a complete Tkinter script paging through **any supported** document, `here it is! <https://github.com/JorjMcKie/PyMuPDF-Utilities/blob/master/doc-browser.py>`_ It can also zoom into pages, and it runs under Python 2 or 3. It requires the extremely handy `PySimpleGUI <https://pypi.org/project/PySimpleGUI/>`_ pure Python package.
+
+PyQt4, PyQt5, PySide
+~~~~~~~~~~~~~~~~~~~~~
+Please also see section 3.16 of the `Pillow documentation <https://Pillow.readthedocs.io>`_::
+
+    from PIL import Image, ImageQt
+
+    # set the mode depending on alpha
+    mode = "RGBA" if pix.alpha else "RGB"
+    img = Image.frombytes(mode, [pix.width, pix.height], pix.samples)
+    qtimg = ImageQt.ImageQt(img)
+
+Again, you also can get along **without using PIL** if you use the pixmap *stride* property::
+
+    from PyQt<x>.QtGui import QImage
+
+    # set the correct QImage format depending on alpha
+    fmt = QImage.Format_RGBA8888 if pix.alpha else QImage.Format_RGB888
+    qtimg = QImage(pix.samples, pix.width, pix.height, pix.stride, fmt)
+
+
+Extracting Text and Images
+---------------------------
+We can also extract all text, images and other information of a page in many different forms, and levels of detail::
+
+    text = page.getText(opt)
+
+Use one of the following strings for *opt* to obtain different formats [#f2]_:
+
+* *"text"*: (default) plain text with line breaks. No formatting, no text position details, no images.
+
+* *"blocks"*: generate a list of text blocks (= paragraphs).
+
+* *"words"*: generate a list of words (strings not containing spaces).
+
+* *"html"*: creates a full visual version of the page including any images. This can be displayed with your internet browser.
+
+* *"dict"* / *"json"*: same information level as HTML, but provided as a Python dictionary or resp. JSON string. See :meth:`TextPage.extractDICT` resp. :meth:`TextPage.extractJSON` for details of its structure.
+
+* *"rawdict"*: a super-set of :meth:`TextPage.extractDICT`. It additionally provides character detail information like XML. See :meth:`TextPage.extractRAWDICT` for details of its structure.
+
+* *"xhtml"*: text information level as the TEXT version but includes images. Can also be displayed by internet browsers.
+
+* *"xml"*: contains no images, but full position and font information down to each single text character. Use an XML module to interpret.
+
+To give you an idea about the output of these alternatives, we did text example extracts. See :ref:`Appendix2`.
+
+Searching for Text
+-------------------
+You can find out, exactly where on a page a certain text string appears::
+
+    areas = page.searchFor("mupdf", hit_max = 16)
+
+This delivers a list of up to 16 rectangles (see :ref:`Rect`), each of which surrounds one occurrence of the string "mupdf" (case insensitive). You could use this information to e.g. highlight those areas (PDF only) or create a cross reference of the document.
+
+Please also do have a look at chapter :ref:`cooperation` and at demo programs `demo.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/demo.py>`_ and `demo-lowlevel.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/demo-lowlevel.py>`_. Among other things they contain details on how the :ref:`TextPage`, :ref:`Device` and :ref:`DisplayList` classes can be used for a more direct control, e.g. when performance considerations suggest it.
+
+PDF Maintenance
+==================
+PDFs are the only document type that can be **modified** using PyMuPDF. Other file types are read-only.
+
+However, you can convert **any document** (including images) to a PDF and then apply all PyMuPDF features to the conversion result. Find out more here :meth:`Document.convertToPDF`, and also look at the demo script `pdf-converter.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/pdf-converter.py>`_ which can convert any supported document to PDF.
+
+:meth:`Document.save()` always stores a PDF in its current (potentially modified) state on disk.
+
+You normally can choose whether to save to a new file, or just append your modifications to the existing one ("incremental save"), which often is very much faster.
+
+The following describes ways how you can manipulate PDF documents. This description is by no means complete: much more can be found in the following chapters.
+
+Modifying, Creating, Re-arranging and Deleting Pages
+-------------------------------------------------------
+There are several ways to manipulate the so-called **page tree** (a structure describing all the pages) of a PDF:
+
+:meth:`Document.deletePage` and :meth:`Document.deletePageRange` delete pages.
+
+:meth:`Document.copyPage`, :meth:`Document.fullcopyPage` and :meth:`Document.movePage` copy or move a page to other locations within the same document.
+
+:meth:`Document.select` shrinks a PDF down to selected pages. Parameter is a sequence [#f3]_ of the page numbers that you want to keep. These integers must all be in range *0 <= i < pageCount*. When executed, all pages **missing** in this list will be deleted. Remaining pages will occur **in the sequence and as many times (!) as you specify them**.
+
+So you can easily create new PDFs with
+
+* the first or last 10 pages,
+* only the odd or only the even pages (for doing double-sided printing),
+* pages that **do** or **don't** contain a given text,
+* reverse the page sequence, ...
+
+... whatever you can think of.
+
+The saved new document will contain links, annotations and bookmarks that are still valid (i.a.w. either pointing to a selected page or to some external resource).
+
+:meth:`Document.insertPage` and :meth:`Document.newPage` insert new pages.
+
+Pages themselves can moreover be modified by a range of methods (e.g. page rotation, annotation and link maintenance, text and image insertion).
+
+Joining and Splitting PDF Documents
+------------------------------------
+
+Method :meth:`Document.insertPDF` copies pages **between different** PDF documents. Here is a simple **joiner** example (*doc1* and *doc2* being openend PDFs)::
+
+    # append complete doc2 to the end of doc1
+    doc1.insertPDF(doc2)
+
+Here is a snippet that **splits** *doc1*. It creates a new document of its first and its last 10 pages::
+
+    doc2 = fitz.open()                 # new empty PDF
+    doc2.insertPDF(doc1, to_page = 9)  # first 10 pages
+    doc2.insertPDF(doc1, from_page = len(doc1) - 10) # last 10 pages
+    doc2.save("first-and-last-10.pdf")
+
+More can be found in the :ref:`Document` chapter. Also have a look at `PDFjoiner.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/PDFjoiner.py>`_.
+
+Embedding Data
+---------------
+
+PDFs can be used as containers for abitrary data (exeutables, other PDFs, text or binary files, etc.) much like ZIP archives.
+
+PyMuPDF fully supports this feature via :ref:`Document` *embeddedFile** methods and attributes. For some detail read :ref:`Appendix 3`, consult the Wiki on `embedding files <https://github.com/pymupdf/PyMuPDF/wiki/Dealing-with-Embedded-Files>`_, or the example scripts `embedded-copy.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/embedded-copy.py>`_, `embedded-export.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/embedded-export.py>`_, `embedded-import.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/embedded-import.py>`_, and `embedded-list.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples/embedded-list.py>`_.
+
+
+Saving
+-------
+
+As mentioned above, :meth:`Document.save` will **always** save the document in its current state.
+
+You can write changes back to the **original PDF** by specifying option *incremental=True*. This process is (usually) **extremely fast**, since changes are **appended to the original file** without completely rewriting it.
+
+:meth:`Document.save` options correspond to options of MuPDF's command line utility *mutool clean*, see the following table.
+
+=================== =========== ==================================================
+**Save Option**     **mutool**  **Effect**
+=================== =========== ==================================================
+garbage=1           g           garbage collect unused objects
+garbage=2           gg          in addition to 1, compact :data:`xref` tables
+garbage=3           ggg         in addition to 2, merge duplicate objects
+garbage=4           gggg        in addition to 3, skip duplicate streams
+clean=1             cs          clean and sanitize content streams
+deflate=1           z           deflate uncompressed streams
+ascii=1             a           convert binary data to ASCII format
+linear=1            l           create a linearized version
+expand=1            i           decompress images
+expand=2            f           decompress fonts
+expand=255          d           decompress all
+=================== =========== ==================================================
+
+For example, *mutool clean -ggggz file.pdf* yields excellent compression results. It corresponds to *doc.save(filename, garbage=4, deflate=1)*.
+
+Closing
+=========
+It is often desirable to "close" a document to relinquish control of the underlying file to the OS, while your program continues.
+
+This can be achieved by the :meth:`Document.close` method. Apart from closing the underlying file, buffer areas associated with the document will be freed.
+
+Further Reading
+================
+Also have a look at PyMuPDF's `Wiki <https://github.com/pymupdf/PyMuPDF/wiki>`_ pages. Especially those named in the sidebar under title **"Recipes"** cover over 15 topics written in "How-To" style.
+
+This document also contains a :ref:`FAQ`. This chapter has close connection to the aforementioned recipes, and it will be extended with more content over time.
+
+.. rubric:: Footnotes
+
+.. [#f1] PyMuPDF lets you also open several image file types just like normal documents. See section :ref:`ImageFiles` in chapter :ref:`Pixmap` for more comments.
+
+.. [#f2] :meth:`Page.getText` is a convenience wrapper for several methods of another PyMuPDF class, :ref:`TextPage`. The names of these methods correspond to the argument string passed to :meth:`Page.getText` \:  *Page.getText("dict")* is equivalent to *TextPage.extractDICT()* \.
+
+.. [#f3] "Sequences" are Python objects conforming to the sequence protocol. These objects implement a method named *__getitem__()*. Best known examples are Python tuples and lists. But *array.array*, *numpy.array* and PyMuPDF's "geometry" objects (:ref:`Algebra`) are sequences, too. Refer to :ref:`SequenceTypes` for details.
diff --git a/docs/vars.rst b/docs/vars.rst

new file mode 100644 (file)

index 0000000..8b1ed89
--- /dev/null
+++ b/docs/vars.rst
@@ -0,0 +1,449 @@
+===============================
+Constants and Enumerations
+===============================
+Constants and enumerations of MuPDF as implemented by PyMuPDF. Each of the following variables is accessible as *fitz.variable*.
+
+
+Constants
+---------
+
+.. py:data:: Base14_Fonts
+
+    Predefined Python list of valid :ref:`Base-14-Fonts`.
+
+    :rtype: list
+
+.. py:data:: csRGB
+
+    Predefined RGB colorspace *fitz.Colorspace(fitz.CS_RGB)*.
+
+    :rtype: :ref:`Colorspace`
+
+.. py:data:: csGRAY
+
+    Predefined GRAY colorspace *fitz.Colorspace(fitz.CS_GRAY)*.
+
+    :rtype: :ref:`Colorspace`
+
+.. py:data:: csCMYK
+
+    Predefined CMYK colorspace *fitz.Colorspace(fitz.CS_CMYK)*.
+
+    :rtype: :ref:`Colorspace`
+
+.. py:data:: CS_RGB
+
+    1 -- Type of :ref:`Colorspace` is RGBA
+
+    :rtype: int
+
+.. py:data:: CS_GRAY
+
+    2 -- Type of :ref:`Colorspace` is GRAY
+
+    :rtype: int
+
+.. py:data:: CS_CMYK
+
+    3 -- Type of :ref:`Colorspace` is CMYK
+
+    :rtype: int
+
+.. py:data:: VersionBind
+
+    'x.xx.x' -- version of PyMuPDF (these bindings)
+
+    :rtype: string
+
+.. py:data:: VersionFitz
+
+    'x.xxx' -- version of MuPDF
+
+    :rtype: string
+
+.. py:data:: VersionDate
+
+    ISO timestamp *YYYY-MM-DD HH:MM:SS* when these bindings were built.
+
+    :rtype: string
+
+.. Note:: The docstring of *fitz* contains information of the above which can be retrieved like so: *print(fitz.__doc__)*, and should look like: *PyMuPDF 1.10.0: Python bindings for the MuPDF 1.10 library, built on 2016-11-30 13:09:13*.
+
+.. py:data:: version
+
+    (VersionBind, VersionFitz, timestamp) -- combined version information where *timestamp* is the generation point in time formatted as "YYYYMMDDhhmmss".
+
+    :rtype: tuple
+
+
+.. _PermissionCodes:
+
+Document Permissions
+----------------------------
+
+====================== =======================================================================
+Code                   Permitted Action
+====================== =======================================================================
+PDF_PERM_PRINT         Print the document
+PDF_PERM_MODIFY        Modify the document's contents
+PDF_PERM_COPY          Copy or otherwise extract text and graphics
+PDF_PERM_ANNOTATE      Add or modify text annotations and interactive form fields
+PDF_PERM_FORM          Fill in forms and sign the document
+PDF_PERM_ACCESSIBILITY Obsolete, always permitted
+PDF_PERM_ASSEMBLE      Insert, rotate, or delete pages, bookmarks, thumbnail images
+PDF_PERM_PRINT_HQ      High quality printing
+====================== =======================================================================
+
+.. _EncryptionMethods:
+
+PDF encryption method codes
+----------------------------
+
+=================== ====================================================
+Code                Meaning
+=================== ====================================================
+PDF_ENCRYPT_KEEP    do not change
+PDF_ENCRYPT_NONE    remove any encryption
+PDF_ENCRYPT_RC4_40  RC4 40 bit
+PDF_ENCRYPT_RC4_128 RC4 128 bit
+PDF_ENCRYPT_AES_128 *Advanced Encryption Standard* 128 bit
+PDF_ENCRYPT_AES_256 *Advanced Encryption Standard* 256 bit
+PDF_ENCRYPT_UNKNOWN unknown
+=================== ====================================================
+
+.. _FontExtensions:
+
+Font File Extensions
+-----------------------
+The table show file extensions you should use when extracting fonts from a PDF file.
+
+==== ============================================================================
+Ext  Description
+==== ============================================================================
+ttf  TrueType font
+pfa  Postscript for ASCII font (various subtypes)
+cff  Type1C font (compressed font equivalent to Type1)
+cid  character identifier font (postscript format)
+otf  OpenType font
+n/a  built-in font (:ref:`Base-14-Fonts` or CJK: cannot be extracted)
+==== ============================================================================
+
+.. _TextAlign:
+
+Text Alignment
+-----------------------
+.. py:data:: TEXT_ALIGN_LEFT
+
+    0 -- align left.
+
+.. py:data:: TEXT_ALIGN_CENTER
+
+    1 -- align center.
+
+.. py:data:: TEXT_ALIGN_RIGHT
+
+    2 -- align right.
+
+.. py:data:: TEXT_ALIGN_JUSTIFY
+
+    3 -- align justify.
+
+.. _TextPreserve:
+
+Preserve Text Flags
+--------------------
+Options controlling the amount of data a text device parses into a :ref:`TextPage`.
+
+.. py:data:: TEXT_PRESERVE_LIGATURES
+
+    1 -- If set, ligatures are passed through to the application in their original form. Otherwise ligatures are expanded into their constituent parts, e.g. the ligature ffi is expanded into three  eparate characters f, f and i.
+
+.. py:data:: TEXT_PRESERVE_WHITESPACE
+
+    2 -- If set, whitespace is passed through to the application in its original form. Otherwise any type of horizontal whitespace (including horizontal tabs) will be replaced with space characters of variable width.
+
+.. py:data:: TEXT_PRESERVE_IMAGES
+
+    4 -- If set, then images will be stored in the structured text structure.
+
+.. py:data:: TEXT_INHIBIT_SPACES
+
+    8 -- If set, we will not try to add missing space characters where there are large gaps between characters.
+
+
+.. _linkDest Kinds:
+
+Link Destination Kinds
+-----------------------
+Possible values of :attr:`linkDest.kind` (link destination kind). For details consult :ref:`AdobeManual`, chapter 8.2 on pp. 581.
+
+.. py:data:: LINK_NONE
+
+    0 -- No destination. Indicates a dummy link.
+
+    :rtype: int
+
+.. py:data:: LINK_GOTO
+
+    1 -- Points to a place in this document.
+
+    :rtype: int
+
+.. py:data:: LINK_URI
+
+    2 -- Points to a URI -- typically a resource specified with internet syntax.
+
+    :rtype: int
+
+.. py:data:: LINK_LAUNCH
+
+    3 -- Launch (open) another file (of any "executable" type).
+
+    :rtype: int
+
+.. py:data:: LINK_NAMED
+
+    4 -- points to a named location.
+
+    :rtype: int
+
+.. py:data:: LINK_GOTOR
+
+    5 -- Points to a place in another PDF document.
+
+    :rtype: int
+
+.. _linkDest Flags:
+
+Link Destination Flags
+-------------------------
+
+.. Note:: The rightmost byte of this integer is a bit field, so test the truth of these bits with the *&* operator.
+
+.. py:data:: LINK_FLAG_L_VALID
+
+    1  (bit 0) Top left x value is valid
+
+    :rtype: bool
+
+.. py:data:: LINK_FLAG_T_VALID
+
+    2  (bit 1) Top left y value is valid
+
+    :rtype: bool
+
+.. py:data:: LINK_FLAG_R_VALID
+
+    4  (bit 2) Bottom right x value is valid
+
+    :rtype: bool
+
+.. py:data:: LINK_FLAG_B_VALID
+
+    8  (bit 3) Bottom right y value is valid
+
+    :rtype: bool
+
+.. py:data:: LINK_FLAG_FIT_H
+
+    16 (bit 4) Horizontal fit
+
+    :rtype: bool
+
+.. py:data:: LINK_FLAG_FIT_V
+
+    32 (bit 5) Vertical fit
+
+    :rtype: bool
+
+.. py:data:: LINK_FLAG_R_IS_ZOOM
+
+    64 (bit 6) Bottom right x is a zoom figure
+
+    :rtype: bool
+
+
+Annotation Related Constants
+-----------------------------
+See chapter 8.4.5, pp. 615 of the :ref:`AdobeManual` for details.
+
+.. _AnnotationTypes:
+
+Annotation Types
+~~~~~~~~~~~~~~~~~
+These identifiers also cover **links** and **widgets**: the PDF specification technically handles them all in the same way, whereas **MuPDF** (and PyMuPDF) treats them as three basically different types of objects.
+
+::
+
+    PDF_ANNOT_TEXT 0
+    PDF_ANNOT_LINK 1  # <=== Link object in PyMuPDF
+    PDF_ANNOT_FREE_TEXT 2
+    PDF_ANNOT_LINE 3
+    PDF_ANNOT_SQUARE 4
+    PDF_ANNOT_CIRCLE 5
+    PDF_ANNOT_POLYGON 6
+    PDF_ANNOT_POLY_LINE 7
+    PDF_ANNOT_HIGHLIGHT 8
+    PDF_ANNOT_UNDERLINE 9
+    PDF_ANNOT_SQUIGGLY 10
+    PDF_ANNOT_STRIKE_OUT 11
+    PDF_ANNOT_REDACT 12
+    PDF_ANNOT_STAMP 13
+    PDF_ANNOT_CARET 14
+    PDF_ANNOT_INK 15
+    PDF_ANNOT_POPUP 16
+    PDF_ANNOT_FILE_ATTACHMENT 17
+    PDF_ANNOT_SOUND 18
+    PDF_ANNOT_MOVIE 19
+    PDF_ANNOT_WIDGET 20  # <=== Widget object in PyMuPDF
+    PDF_ANNOT_SCREEN 21
+    PDF_ANNOT_PRINTER_MARK 22
+    PDF_ANNOT_TRAP_NET 23
+    PDF_ANNOT_WATERMARK 24
+    PDF_ANNOT_3D 25
+    PDF_ANNOT_UNKNOWN -1
+
+.. _AnnotationFlags:
+
+Annotation Flag Bits
+~~~~~~~~~~~~~~~~~~~~~
+::
+
+    PDF_ANNOT_IS_INVISIBLE 1 << (1-1)
+    PDF_ANNOT_IS_HIDDEN 1 << (2-1)
+    PDF_ANNOT_IS_PRINT 1 << (3-1)
+    PDF_ANNOT_IS_NO_ZOOM 1 << (4-1)
+    PDF_ANNOT_IS_NO_ROTATE 1 << (5-1)
+    PDF_ANNOT_IS_NO_VIEW 1 << (6-1)
+    PDF_ANNOT_IS_READ_ONLY 1 << (7-1)
+    PDF_ANNOT_IS_LOCKED 1 << (8-1)
+    PDF_ANNOT_IS_TOGGLE_NO_VIEW 1 << (9-1)
+    PDF_ANNOT_IS_LOCKED_CONTENTS 1 << (10-1)
+
+.. _AnnotationLineEnds:
+
+Annotation Line Ending Styles
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
+
+    PDF_ANNOT_LE_NONE 0
+    PDF_ANNOT_LE_SQUARE 1
+    PDF_ANNOT_LE_CIRCLE 2
+    PDF_ANNOT_LE_DIAMOND 3
+    PDF_ANNOT_LE_OPEN_ARROW 4
+    PDF_ANNOT_LE_CLOSED_ARROW 5
+    PDF_ANNOT_LE_BUTT 6
+    PDF_ANNOT_LE_R_OPEN_ARROW 7
+    PDF_ANNOT_LE_R_CLOSED_ARROW 8
+    PDF_ANNOT_LE_SLASH 9
+
+
+Widget Constants
+-----------------
+
+.. _WidgetTypes:
+
+Widget Types (*field_type*)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
+
+    PDF_WIDGET_TYPE_UNKNOWN 0
+    PDF_WIDGET_TYPE_BUTTON 1
+    PDF_WIDGET_TYPE_CHECKBOX 2
+    PDF_WIDGET_TYPE_COMBOBOX 3
+    PDF_WIDGET_TYPE_LISTBOX 4
+    PDF_WIDGET_TYPE_RADIOBUTTON 5
+    PDF_WIDGET_TYPE_SIGNATURE 6
+    PDF_WIDGET_TYPE_TEXT 7
+
+Text Widget Subtypes (*text_format*)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
+
+    PDF_WIDGET_TX_FORMAT_NONE 0
+    PDF_WIDGET_TX_FORMAT_NUMBER 1
+    PDF_WIDGET_TX_FORMAT_SPECIAL 2
+    PDF_WIDGET_TX_FORMAT_DATE 3
+    PDF_WIDGET_TX_FORMAT_TIME 4
+
+
+Widget flags (*field_flags*)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+**Common to all field types**::
+
+    PDF_FIELD_IS_READ_ONLY 1
+    PDF_FIELD_IS_REQUIRED 1 << 1
+    PDF_FIELD_IS_NO_EXPORT 1 << 2
+
+**Text widgets**::
+
+    PDF_TX_FIELD_IS_MULTILINE  1 << 12
+    PDF_TX_FIELD_IS_PASSWORD  1 << 13
+    PDF_TX_FIELD_IS_FILE_SELECT  1 << 20
+    PDF_TX_FIELD_IS_DO_NOT_SPELL_CHECK  1 << 22
+    PDF_TX_FIELD_IS_DO_NOT_SCROLL  1 << 23
+    PDF_TX_FIELD_IS_COMB  1 << 24
+    PDF_TX_FIELD_IS_RICH_TEXT  1 << 25
+
+**Button widgets**::
+
+    PDF_BTN_FIELD_IS_NO_TOGGLE_TO_OFF  1 << 14
+    PDF_BTN_FIELD_IS_RADIO  1 << 15
+    PDF_BTN_FIELD_IS_PUSHBUTTON  1 << 16
+    PDF_BTN_FIELD_IS_RADIOS_IN_UNISON  1 << 25
+
+**Choice widgets**::
+
+    PDF_CH_FIELD_IS_COMBO  1 << 17
+    PDF_CH_FIELD_IS_EDIT  1 << 18
+    PDF_CH_FIELD_IS_SORT  1 << 19
+    PDF_CH_FIELD_IS_MULTI_SELECT  1 << 21
+    PDF_CH_FIELD_IS_DO_NOT_SPELL_CHECK  1 << 22
+    PDF_CH_FIELD_IS_COMMIT_ON_SEL_CHANGE  1 << 26
+
+
+.. _BlendModes:
+
+PDF Standard Blend Modes
+----------------------------
+
+For an explanation see :ref:`AdobeManual`, page 520::
+
+    PDF_BM_Color "Color"
+    PDF_BM_ColorBurn "ColorBurn"
+    PDF_BM_ColorDodge "ColorDodge"
+    PDF_BM_Darken "Darken"
+    PDF_BM_Difference "Difference"
+    PDF_BM_Exclusion "Exclusion"
+    PDF_BM_HardLight "HardLight"
+    PDF_BM_Hue "Hue"
+    PDF_BM_Lighten "Lighten"
+    PDF_BM_Luminosity "Luminosity"
+    PDF_BM_Multiply "Multiply"
+    PDF_BM_Normal "Normal"
+    PDF_BM_Overlay "Overlay"
+    PDF_BM_Saturation "Saturation"
+    PDF_BM_Screen "Screen"
+    PDF_BM_SoftLight "Softlight"
+
+
+.. _StampIcons:
+
+Stamp Annotation Icons
+----------------------------
+MuPDF has defined the following icons for **rubber stamp** annotations::
+
+    STAMP_Approved 0
+    STAMP_AsIs 1
+    STAMP_Confidential 2
+    STAMP_Departmental 3
+    STAMP_Experimental 4
+    STAMP_Expired 5
+    STAMP_Final 6
+    STAMP_ForComment 7
+    STAMP_ForPublicRelease 8
+    STAMP_NotApproved 9
+    STAMP_NotForPublicRelease 10
+    STAMP_Sold 11
+    STAMP_TopSecret 12
+    STAMP_Draft 13
diff --git a/docs/version.rst b/docs/version.rst

new file mode 100644 (file)

index 0000000..d528198
--- /dev/null
+++ b/docs/version.rst
@@ -0,0 +1,6 @@
+Covered Version
+--------------------
+
+This documentation covers PyMuPDF v1.17.4 features as of **2020-07-20 18:09:40**.
+
+.. note:: The major and minor versions of **PyMuPDF** and **MuPDF** will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF.
+\ No newline at end of file
diff --git a/docs/wheelnames.txt b/docs/wheelnames.txt

new file mode 100644 (file)

index 0000000..a3744da
--- /dev/null
+++ b/docs/wheelnames.txt
@@ -0,0 +1,21 @@
+PyMuPDF-x.xx.xx-cp27-cp27m-macosx_10_9_x86_64.whl
+PyMuPDF-x.xx.xx-cp27-cp27m-manylinux2010_x86_64.whl
+PyMuPDF-x.xx.xx-cp27-cp27m-win32.whl
+PyMuPDF-x.xx.xx-cp27-cp27m-win_amd64.whl
+PyMuPDF-x.xx.xx-cp27-cp27mu-manylinux2010_x86_64.whl
+PyMuPDF-x.xx.xx-cp35-cp35m-macosx_10_9_x86_64.whl
+PyMuPDF-x.xx.xx-cp35-cp35m-manylinux2010_x86_64.whl
+PyMuPDF-x.xx.xx-cp35-cp35m-win32.whl
+PyMuPDF-x.xx.xx-cp35-cp35m-win_amd64.whl
+PyMuPDF-x.xx.xx-cp36-cp36m-macosx_10_9_x86_64.whl
+PyMuPDF-x.xx.xx-cp36-cp36m-manylinux2010_x86_64.whl
+PyMuPDF-x.xx.xx-cp36-cp36m-win32.whl
+PyMuPDF-x.xx.xx-cp36-cp36m-win_amd64.whl
+PyMuPDF-x.xx.xx-cp37-cp37m-macosx_10_9_x86_64.whl
+PyMuPDF-x.xx.xx-cp37-cp37m-manylinux2010_x86_64.whl
+PyMuPDF-x.xx.xx-cp37-cp37m-win32.whl
+PyMuPDF-x.xx.xx-cp37-cp37m-win_amd64.whl
+PyMuPDF-x.xx.xx-cp38-cp38-macosx_10_9_x86_64.whl
+PyMuPDF-x.xx.xx-cp38-cp38-manylinux2010_x86_64.whl
+PyMuPDF-x.xx.xx-cp38-cp38-win32.whl
+PyMuPDF-x.xx.xx-cp38-cp38-win_amd64.whl
diff --git a/docs/widget.rst b/docs/widget.rst

new file mode 100644 (file)

index 0000000..48dffe5
--- /dev/null
+++ b/docs/widget.rst
@@ -0,0 +1,168 @@
+.. _Widget:
+
+================
+Widget
+================
+
+This class represents a PDF Form field, also called "widget". Fields are a special case of annotations, which allow users with limited permissions to enter information in a PDF. This is primarily used for filling out forms.
+
+Like annotations, widgets live on PDF pages. Similar to annotations, the first widget on a page is accessible via :attr:`Page.firstWidget` and subsequent widgets can be accessed via the :attr:`Widget.next` property.
+
+*(Changed in version 1.16.0)* MuPDF no longer treats widgets as a subset of general annotations. Consequently, :attr:`Page.firstAnnot` and :meth:`Annot.next` will deliver non-widget annotations exclusively, and be *None* if only form fields exist on a page. Vice versa, :attr:`Page.firstWidget` and :meth:`Widget.next` will only show widgets. This design decision is purely internal to MuPDF; technically, links, annotations and fields have a lot in common and also continue to share the better part of their code within (Py-) MuPDF.
+
+
+**Class API**
+
+.. class:: Widget
+
+    .. method:: update
+
+       After any changes to a widget, this method **must be used** to store them in the PDF [#f1]_.
+
+    .. method:: reset
+
+       Reset the field's value to its default -- if defined -- or remove it. Do not forget to issue :meth:`update` afterwards.
+
+    .. attribute:: next
+
+       Point to the next form field on the page.
+
+    .. attribute:: border_color
+
+       A list of up to 4 floats defining the field's border. Default value is *None* which causes border style and border width to be ignored.
+
+    .. attribute:: border_style
+
+       A string defining the line style of the field's border. See :attr:`Annot.border`. Default is "s" ("Solid") -- a continuous line. Only the first character (upper or lower case) will be regarded when creating a widget.
+
+    .. attribute:: border_width
+
+       A float defining the width of the border line. Default is 1.
+
+    .. attribute:: border_dashes
+
+       A list/tuple of integers defining the dash properties of the border line. This is only meaningful if *border_style == "D"* and :attr:`border_color` is provided.
+
+    .. attribute:: choice_values
+
+       Python sequence of strings defining the valid choices of list boxes and combo boxes. For these widgets, this property is mandatory and must contain at least two items. Ignored for other types.
+
+    .. attribute:: field_name
+
+       A mandatory string defining the field's name. No checking for duplicates takes place.
+
+    .. attribute:: field_label
+
+       An optional string containing an "alternate" field name. Typically used for any notes, help on field usage, etc. Default is the field name.
+
+    .. attribute:: field_value
+
+       The value of the field.
+
+    .. attribute:: field_flags
+
+       An integer defining a large amount of proprties of a field. Handle this attribute with care.
+
+    .. attribute:: field_type
+
+       A mandatory integer defining the field type. This is a value in the range of 0 to 6. It cannot be changed when updating the widget.
+
+    .. attribute:: field_type_string
+
+       A string describing (and derived from) the field type.
+
+    .. attribute:: fill_color
+
+       A list of up to 4 floats defining the field's background color.
+
+    .. attribute:: button_caption
+
+       The caption string of a button-type field.
+
+    .. attribute:: is_signed
+
+       A bool indicating the status of a signature field, else *None*.
+
+    .. attribute:: rect
+
+       The rectangle containing the field.
+
+    .. attribute:: text_color
+
+       A list of **1, 3 or 4 floats** defining the text color. Default value is black (`[0, 0, 0]`).
+
+    .. attribute:: text_font
+
+       A string defining the font to be used. Default and replacement for invalid values is *"Helv"*. For valid font reference names see the table below.
+
+    .. attribute:: text_fontsize
+
+       A float defining the text fontsize. Default value is zero, which causes PDF viewer software to dynamically choose a size suitable for the annotation's rectangle and text amount.
+
+    .. attribute:: text_maxlen
+
+       An integer defining the maximum number of text characters. PDF viewers will (should) not accept a longer text.
+
+    .. attribute:: text_type
+
+       An integer defining acceptable text types (e.g. numeric, date, time, etc.). For reference only for the time being -- will be ignored when creating or updating widgets.
+
+    .. attribute:: xref
+
+       The PDF :data:`xref` of the widget.
+
+    .. attribute:: script
+
+       *(New in version 1.16.12)* JavaScript text (unicode) for an action associated with the widget, or *None*. This is the only script action supported for **button type** widgets.
+
+    .. attribute:: script_stroke
+
+       *(New in version 1.16.12)* JavaScript text (unicode) to be performed when the user types a key-stroke into a text field or combo box or modifies the selection in a scrollable list box. This action can check the keystroke for validity and reject or modify it. *None* if not present.
+
+    .. attribute:: script_format
+
+       *(New in version 1.16.12)* JavaScript text (unicode) to be performed before the field is formatted to display its current value. This action can modify the field’s value before formatting. *None* if not present.
+
+    .. attribute:: script_change
+
+       *(New in version 1.16.12)* JavaScript text (unicode) to be performed when the field’s value is changed. This action can check the new value for validity. *None* if not present.
+
+    .. attribute:: script_calc
+
+       *(New in version 1.16.12)* JavaScript text (unicode) to be performed to recalculate the value of this field when that of another field changes. *None* if not present.
+
+    .. note::
+       1. For **adding** or **changing** one of the above scripts, just put the appropriate JavaScript source code in the widget attribute. To **remove** a script, set the respective attribute to *None*.
+       2. Button fields only support :attr:`script`. Other script entries will automatically be set to *None*.
+
+
+Standard Fonts for Widgets
+----------------------------------
+Widgets use their own resources object */DR*. A widget resources object must at least contain a */Font* object. Widget fonts are independent from page fonts. We currently support the 14 PDF base fonts using the following fixed reference names, or any name of an already existing field font. When specifying a text font for new or changed widgets, **either** choose one in the first table column (upper and lower case supported), **or** one of the already existing form fonts. In the latter case, spelling must exactly match.
+
+To find out already existing field fonts, inspect the list :attr:`Document.FormFonts`.
+
+============= =======================
+**Reference** **Base14 Fontname**
+============= =======================
+CoBI          Courier-BoldOblique
+CoBo          Courier-Bold
+CoIt          Courier-Oblique
+Cour          Courier
+HeBI          Helvetica-BoldOblique
+HeBo          Helvetica-Bold
+HeIt          Helvetica-Oblique
+Helv          Helvetica **(default)**
+Symb          Symbol
+TiBI          Times-BoldItalic
+TiBo          Times-Bold
+TiIt          Times-Italic
+TiRo          Times-Roman
+ZaDb          ZapfDingbats
+============= =======================
+
+You are generally free to use any font for every widget. However, we recommend using *ZaDb* ("ZapfDingbats") and fontsize 0 for check boxes: typical viewers will put a correctly sized tickmark in the field's rectangle, when it is clicked.
+
+.. rubric:: Footnotes
+
+.. [#f1] If you intend to re-access a new or updated field (e.g. for making a pixmap), make sure to reload the page first. Either close and re-open the document, or load another page first, or simply do ``page = doc.reload_page(page)``.
diff --git a/fitz/__init__.py b/fitz/__init__.py

new file mode 100644 (file)

index 0000000..e42df90
--- /dev/null
+++ b/fitz/__init__.py
@@ -0,0 +1,105 @@
+from __future__ import absolute_import, print_function
+import sys
+from fitz.fitz import *
+
+# define the supported colorspaces for convenience
+fitz.csRGB = fitz.Colorspace(fitz.CS_RGB)
+fitz.csGRAY = fitz.Colorspace(fitz.CS_GRAY)
+fitz.csCMYK = fitz.Colorspace(fitz.CS_CMYK)
+csRGB = fitz.csRGB
+csGRAY = fitz.csGRAY
+csCMYK = fitz.csCMYK
+
+# create the TOOLS object
+TOOLS = fitz.Tools()
+fitz.TOOLS = TOOLS
+
+if fitz.VersionFitz != fitz.TOOLS.mupdf_version():
+    v1 = fitz.VersionFitz.split(".")
+    v2 = fitz.TOOLS.mupdf_version().split(".")
+    if v1[:-1] != v2[:-1]:
+        raise ValueError(
+            "MuPDF library mismatch %s <> %s"
+            % (fitz.VersionFitz, fitz.TOOLS.mupdf_version())
+        )
+
+
+# copy functions to their respective fitz classes
+import fitz.utils
+
+# ------------------------------------------------------------------------------
+# Document
+# ------------------------------------------------------------------------------
+fitz.open = fitz.Document
+fitz.Document.getToC = fitz.utils.getToC
+fitz.Document._do_links = fitz.utils.do_links
+fitz.Document.getPagePixmap = fitz.utils.getPagePixmap
+fitz.Document.getPageText = fitz.utils.getPageText
+fitz.Document.setMetadata = fitz.utils.setMetadata
+fitz.Document.setToC = fitz.utils.setToC
+fitz.Document.searchPageFor = fitz.utils.searchPageFor
+fitz.Document.newPage = fitz.utils.newPage
+fitz.Document.insertPage = fitz.utils.insertPage
+fitz.Document.getCharWidths = fitz.utils.getCharWidths
+fitz.Document.scrub = fitz.utils.scrub
+
+# ------------------------------------------------------------------------------
+# Page
+# ------------------------------------------------------------------------------
+fitz.Page.apply_redactions = fitz.utils.apply_redactions
+fitz.Page.drawBezier = fitz.utils.drawBezier
+fitz.Page.drawCircle = fitz.utils.drawCircle
+fitz.Page.drawCurve = fitz.utils.drawCurve
+fitz.Page.drawLine = fitz.utils.drawLine
+fitz.Page.drawOval = fitz.utils.drawOval
+fitz.Page.drawPolyline = fitz.utils.drawPolyline
+fitz.Page.drawQuad = fitz.utils.drawQuad
+fitz.Page.drawRect = fitz.utils.drawRect
+fitz.Page.drawSector = fitz.utils.drawSector
+fitz.Page.drawSquiggle = fitz.utils.drawSquiggle
+fitz.Page.drawZigzag = fitz.utils.drawZigzag
+fitz.Page.getLinks = fitz.utils.getLinks
+fitz.Page.getPixmap = fitz.utils.getPixmap
+fitz.Page.getText = fitz.utils.getText
+fitz.Page.getTextBlocks = fitz.utils.getTextBlocks
+fitz.Page.getTextWords = fitz.utils.getTextWords
+fitz.Page.insertImage = fitz.utils.insertImage
+fitz.Page.insertLink = fitz.utils.insertLink
+fitz.Page.insertText = fitz.utils.insertText
+fitz.Page.insertTextbox = fitz.utils.insertTextbox
+fitz.Page.newShape = lambda x: fitz.utils.Shape(x)
+fitz.Page.searchFor = fitz.utils.searchFor
+fitz.Page.showPDFpage = fitz.utils.showPDFpage
+fitz.Page.updateLink = fitz.utils.updateLink
+fitz.Page.writeText = fitz.utils.writeText
+# ------------------------------------------------------------------------------
+# Rect
+# ------------------------------------------------------------------------------
+fitz.Rect.getRectArea = fitz.utils.getRectArea
+fitz.Rect.getArea = fitz.utils.getRectArea
+
+# ------------------------------------------------------------------------------
+# IRect
+# ------------------------------------------------------------------------------
+fitz.IRect.getRectArea = fitz.utils.getRectArea
+fitz.IRect.getArea = fitz.utils.getRectArea
+
+# ------------------------------------------------------------------------------
+# IRect
+# ------------------------------------------------------------------------------
+fitz.TextWriter.fillTextbox = fitz.utils.fillTextbox
+
+
+fitz.__doc__ = """
+PyMuPDF %s: Python bindings for the MuPDF %s library.
+Version date: %s.
+Built for Python %i.%i on %s (%i-bit).
+""" % (
+    fitz.VersionBind,
+    fitz.VersionFitz,
+    fitz.VersionDate,
+    sys.version_info[0],
+    sys.version_info[1],
+    sys.platform,
+    64 if sys.maxsize > 2 ** 32 else 32,
+)
diff --git a/fitz/__main__.py b/fitz/__main__.py

new file mode 100644 (file)

index 0000000..91111da
--- /dev/null
+++ b/fitz/__main__.py
@@ -0,0 +1,782 @@
+from __future__ import division, print_function
+
+import os
+import sys
+
+import fitz
+
+mycenter = lambda x: (" %s " % x).center(75, "-")
+
+
+def recoverpix(doc, item):
+    """Return image for a given XREF.
+    """
+    x = item[0]  # xref of PDF image
+    s = item[1]  # xref of its /SMask
+    if s == 0:  # no smask: use direct image output
+        return doc.extractImage(x)
+
+    def getimage(pix):
+        if pix.colorspace.n != 4:
+            return pix
+        tpix = fitz.Pixmap(fitz.csRGB, pix)
+        return tpix
+
+    # we need to reconstruct the alpha channel with the smask
+    pix1 = fitz.Pixmap(doc, x)
+    pix2 = fitz.Pixmap(doc, s)  # create pixmap of the /SMask entry
+
+    """Sanity check:
+    - both pixmaps must have the same rectangle
+    - both pixmaps must have alpha=0
+    - pix2 must consist of 1 byte per pixel
+    """
+    if not (pix1.irect == pix2.irect and pix1.alpha == pix2.alpha == 0 and pix2.n == 1):
+        print("Warning: unsupported /SMask %i for %i:" % (s, x))
+        print(pix2)
+        pix2 = None
+        return getimage(pix1)  # return the pixmap as is
+
+    pix = fitz.Pixmap(pix1)  # copy of pix1, with an alpha channel added
+    pix.setAlpha(pix2.samples)  # treat pix2.samples as the alpha values
+    pix1 = pix2 = None  # free temp pixmaps
+
+    # we may need to adjust something for CMYK pixmaps here:
+    return getimage(pix)
+
+
+def open_file(filename, password, show=False, pdf=True):
+    """Open and authenticate a document.
+    """
+    doc = fitz.open(filename)
+    if not doc.isPDF and pdf is True:
+        sys.exit("this command supports PDF files only")
+    rc = -1
+    if not doc.needsPass:
+        return doc
+    if password:
+        rc = doc.authenticate(password)
+        if not rc:
+            sys.exit("authentication unsuccessful")
+        if show is True:
+            print("authenticated as %s" % "owner" if rc > 2 else "user")
+    else:
+        sys.exit("'%s' requires a password" % doc.name)
+    return doc
+
+
+def print_dict(item):
+    """Print a Python dictionary.
+    """
+    l = max([len(k) for k in item.keys()]) + 1
+    for k, v in item.items():
+        msg = "%s: %s" % (k.rjust(l), v)
+        print(msg)
+    return
+
+
+def print_xref(doc, xref):
+    """Print an object given by XREF number.
+
+    Simulate the PDF source in "pretty" format.
+    For a stream also print its size.
+    """
+    print("%i 0 obj" % xref)
+    xref_str = doc.xrefObject(xref)
+    print(xref_str)
+    if doc.isStream(xref):
+        temp = xref_str.split()
+        try:
+            idx = temp.index("/Length") + 1
+            size = temp[idx]
+            if size.endswith("0 R"):
+                size = "unknown"
+        except:
+            size = "unknown"
+        print("stream\n...%s bytes" % size)
+        print("endstream")
+    print("endobj")
+
+
+def get_list(rlist, limit, what="page"):
+    """Transform a page / xref specification into a list of integers.
+
+    Args
+    ----
+        rlist: (str) the specification
+        limit: maximum number, i.e. number of pages, number of objects
+        what: a string to be used in error messages
+    Returns
+    -------
+        A list of integers representing the specification.
+    """
+    N = str(limit - 1)
+    rlist = rlist.replace("N", N).replace(" ", "")
+    rlist_arr = rlist.split(",")
+    out_list = []
+    for seq, item in enumerate(rlist_arr):
+        n = seq + 1
+        if item.isdecimal():  # a single integer
+            i = int(item)
+            if 1 <= i < limit:
+                out_list.append(int(item))
+            else:
+                sys.exit("bad %s specification at item %i" % (what, n))
+            continue
+        try:  # this must be a range now, and all of the following must work:
+            i1, i2 = item.split("-")  # will fail if not 2 items produced
+            i1 = int(i1)  # will fail on non-integers
+            i2 = int(i2)
+        except:
+            sys.exit("bad %s range specification at item %i" % (what, n))
+
+        if not (1 <= i1 < limit and 1 <= i2 < limit):
+            sys.exit("bad %s range specification at item %i" % (what, n))
+
+        if i1 == i2:  # just in case: a range of equal numbers
+            out_list.append(i1)
+            continue
+
+        if i1 < i2:  # first less than second
+            out_list += list(range(i1, i2 + 1))
+        else:  # first larger than second
+            out_list += list(range(i1, i2 - 1, -1))
+
+    return out_list
+
+
+def show(args):
+    doc = open_file(args.input, args.password, True)
+    size = os.path.getsize(args.input) / 1024
+    flag = "KB"
+    if size > 1000:
+        size /= 1024
+        flag = "MB"
+    size = round(size, 1)
+    meta = doc.metadata
+    print(
+        "'%s', pages: %i, objects: %i, %g %s, %s, encryption: %s"
+        % (
+            args.input,
+            doc.pageCount,
+            doc._getXrefLength() - 1,
+            size,
+            flag,
+            meta["format"],
+            meta["encryption"],
+        )
+    )
+    n = doc.isFormPDF
+    if n > 0:
+        s = doc.getSigFlags()
+        print(
+            "document contains %i root form fields and is %ssigned"
+            % (n, "not " if s != 3 else "")
+        )
+    n = doc.embeddedFileCount()
+    if n > 0:
+        print("document contains %i embedded files" % n)
+    print()
+    if args.catalog:
+        print(mycenter("PDF catalog"))
+        xref = doc.PDFCatalog()
+        print_xref(doc, xref)
+        print()
+    if args.metadata:
+        print(mycenter("PDF metadata"))
+        print_dict(doc.metadata)
+        print()
+    if args.xrefs:
+        print(mycenter("object information"))
+        xrefl = get_list(args.xrefs, doc._getXrefLength(), what="xref")
+        for xref in xrefl:
+            print_xref(doc, xref)
+            print()
+    if args.pages:
+        print(mycenter("page information"))
+        pagel = get_list(args.pages, doc.pageCount + 1)
+        for pno in pagel:
+            n = pno - 1
+            xref = doc._getPageXref(n)[0]
+            print("Page %i:" % pno)
+            print_xref(doc, xref)
+            print()
+    if args.trailer:
+        print(mycenter("PDF trailer"))
+        print(doc.PDFTrailer())
+        print()
+    doc.close()
+
+
+def clean(args):
+    doc = open_file(args.input, args.password, pdf=True)
+    encryption = args.encryption
+    encrypt = ("keep", "none", "rc4-40", "rc4-128", "aes-128", "aes-256").index(
+        encryption
+    )
+
+    if not args.pages:  # simple cleaning
+        doc.save(
+            args.output,
+            garbage=args.garbage,
+            deflate=args.compress,
+            pretty=args.pretty,
+            clean=args.sanitize,
+            ascii=args.ascii,
+            linear=args.linear,
+            encryption=encrypt,
+            owner_pw=args.owner,
+            user_pw=args.user,
+            permissions=args.permission,
+        )
+        return
+
+    # create sub document from page numbers
+    pages = get_list(args.pages, doc.pageCount + 1)
+    outdoc = fitz.open()
+    for pno in pages:
+        n = pno - 1
+        outdoc.insertPDF(doc, from_page=n, to_page=n)
+    outdoc.save(
+        args.output,
+        garbage=args.garbage,
+        deflate=args.compress,
+        pretty=args.pretty,
+        clean=args.sanitize,
+        ascii=args.ascii,
+        linear=args.linear,
+        encryption=encrypt,
+        owner_pw=args.owner,
+        user_pw=args.user,
+        permissions=args.permission,
+    )
+    doc.close()
+    outdoc.close()
+    return
+
+
+def doc_join(args):
+    """Join pages from several PDF documents.
+    """
+    doc_list = args.input  # a list of input PDFs
+    doc = fitz.open()  # output PDF
+    for src_item in doc_list:  # process one input PDF
+        src_list = src_item.split(",")
+        password = src_list[1] if len(src_list) > 1 else None
+        src = open_file(src_list[0], password, pdf=True)
+        pages = ",".join(src_list[2:])  # get 'pages' specifications
+        if pages:  # if anything there, retrieve a list of desired pages
+            page_list = get_list(",".join(src_list[2:]), src.pageCount + 1)
+        else:  # take all pages
+            page_list = range(1, src.pageCount + 1)
+        for i in page_list:
+            doc.insertPDF(src, from_page=i - 1, to_page=i - 1)  # copy each source page
+        src.close()
+
+    doc.save(args.output, garbage=4, deflate=True)
+    doc.close()
+
+
+def embedded_copy(args):
+    """Copy embedded files between PDFs.
+    """
+    doc = open_file(args.input, args.password, pdf=True)
+    if not doc.can_save_incrementally() and (
+        not args.output or args.output == args.input
+    ):
+        sys.exit("cannot save PDF incrementally")
+    src = open_file(args.source, args.pwdsource)
+    names = set(args.name) if args.name else set()
+    src_names = set(src.embeddedFileNames())
+    if names:
+        if not names <= src_names:
+            sys.exit("not all names are contained in source")
+    else:
+        names = src_names
+    if not names:
+        sys.exit("nothing to copy")
+    intersect = names & set(
+        doc.embeddedFileNames()
+    )  # any equal name already in target?
+    if intersect:
+        sys.exit("following names already exist in receiving PDF: %s" % str(intersect))
+
+    for item in names:
+        info = src.embeddedFileInfo(item)
+        buff = src.embeddedFileGet(item)
+        doc.embeddedFileAdd(
+            item,
+            buff,
+            filename=info["filename"],
+            ufilename=info["ufilename"],
+            desc=info["desc"],
+        )
+        print("copied entry '%s' from '%s'" % (item, src.name))
+    src.close()
+    if args.output and args.output != args.input:
+        doc.save(args.output, garbage=3)
+    else:
+        doc.saveIncr()
+    doc.close()
+
+
+def embedded_del(args):
+    """Delete an embedded file entry.
+    """
+    doc = open_file(args.input, args.password, pdf=True)
+    if not doc.can_save_incrementally() and (
+        not args.output or args.output == args.input
+    ):
+        sys.exit("cannot save PDF incrementally")
+
+    try:
+        doc.embeddedFileDel(args.name)
+    except ValueError:
+        sys.exit("no such embedded file '%s'" % args.name)
+    if not args.output or args.output == args.input:
+        doc.saveIncr()
+    else:
+        doc.save(args.output, garbage=1)
+    doc.close()
+
+
+def embedded_get(args):
+    """Retrieve contents of an embedded file.
+    """
+    doc = open_file(args.input, args.password, pdf=True)
+    try:
+        stream = doc.embeddedFileGet(args.name)
+        d = doc.embeddedFileInfo(args.name)
+    except ValueError:
+        sys.exit("no such embedded file '%s'" % args.name)
+    filename = args.output if args.output else d["filename"]
+    output = open(filename, "wb")
+    output.write(stream)
+    output.close()
+    print("saved entry '%s' as '%s'" % (args.name, filename))
+    doc.close()
+
+
+def embedded_add(args):
+    """Insert a new embedded file.
+    """
+    doc = open_file(args.input, args.password, pdf=True)
+    if not doc.can_save_incrementally() and (
+        args.output is None or args.output == args.input
+    ):
+        sys.exit("cannot save PDF incrementally")
+
+    try:
+        doc.embeddedFileDel(args.name)
+        sys.exit("entry '%s' already exists" % args.name)
+    except:
+        pass
+
+    if not os.path.exists(args.path) or not os.path.isfile(args.path):
+        sys.exit("no such file '%s'" % args.path)
+    stream = open(args.path, "rb").read()
+    filename = args.path
+    ufilename = filename
+    if not args.desc:
+        desc = filename
+    else:
+        desc = args.desc
+    doc.embeddedFileAdd(
+        args.name, stream, filename=filename, ufilename=ufilename, desc=desc
+    )
+    if not args.output or args.output == args.input:
+        doc.saveIncr()
+    else:
+        doc.save(args.output, garbage=3)
+    doc.close()
+
+
+def embedded_upd(args):
+    """Update contents or metadata of an embedded file.
+    """
+    doc = open_file(args.input, args.password, pdf=True)
+    if not doc.can_save_incrementally() and (
+        args.output is None or args.output == args.input
+    ):
+        sys.exit("cannot save PDF incrementally")
+
+    try:
+        doc.embeddedFileInfo(args.name)
+    except:
+        sys.exit("no such embedded file '%s'" % args.name)
+
+    if (
+        args.path is not None
+        and os.path.exists(args.path)
+        and os.path.isfile(args.path)
+    ):
+        stream = open(args.path, "rb").read()
+    else:
+        stream = None
+
+    if args.filename:
+        filename = args.filename
+    else:
+        filename = None
+
+    if args.ufilename:
+        ufilename = args.ufilename
+    elif args.filename:
+        ufilename = args.filename
+    else:
+        ufilename = None
+
+    if args.desc:
+        desc = args.desc
+    else:
+        desc = None
+
+    doc.embeddedFileUpd(
+        args.name, stream, filename=filename, ufilename=ufilename, desc=desc
+    )
+    if args.output is None or args.output == args.input:
+        doc.saveIncr()
+    else:
+        doc.save(args.output, garbage=3)
+    doc.close()
+
+
+def embedded_list(args):
+    """List embedded files.
+    """
+    doc = open_file(args.input, args.password, pdf=True)
+    names = doc.embeddedFileNames()
+    if args.name is not None:
+        if args.name not in names:
+            sys.exit("no such embedded file '%s'" % args.name)
+        else:
+            print()
+            print(
+                "printing 1 of %i embedded file%s:"
+                % (len(names), "s" if len(names) > 1 else "")
+            )
+            print()
+            print_dict(doc.embeddedFileInfo(args.name))
+            print()
+            return
+    if not names:
+        print("'%s' contains no embedded files" % doc.name)
+        return
+    if len(names) > 1:
+        msg = "'%s' contains the following %i embedded files" % (doc.name, len(names))
+    else:
+        msg = "'%s' contains the following embedded file" % doc.name
+    print(msg)
+    print()
+    for name in names:
+        if not args.detail:
+            print(name)
+            continue
+        _ = doc.embeddedFileInfo(name)
+        print_dict(doc.embeddedFileInfo(name))
+        print()
+    doc.close()
+
+
+def extract_objects(args):
+    """Extract images and / or fonts from a PDF.
+    """
+    if not args.fonts and not args.images:
+        sys.exit("neither fonts nor images requested")
+    doc = open_file(args.input, args.password, pdf=True)
+
+    if args.pages:
+        pages = get_list(args.pages, doc.pageCount + 1)
+    else:
+        pages = range(1, doc.pageCount + 1)
+
+    if not args.output:
+        out_dir = os.path.abspath(os.curdir)
+    else:
+        out_dir = args.output
+        if not (os.path.exists(out_dir) and os.path.isdir(out_dir)):
+            sys.exit("output directory %s does not exist" % out_dir)
+
+    font_xrefs = set()  # already saved fonts
+    image_xrefs = set()  # already saved images
+
+    for pno in pages:
+        if args.fonts:
+            itemlist = doc.getPageFontList(pno - 1)
+            for item in itemlist:
+                xref = item[0]
+                if xref not in font_xrefs:
+                    font_xrefs.add(xref)
+                    fontname, ext, _, buffer = doc.extractFont(xref)
+                    if ext == "n/a" or not buffer:
+                        continue
+                    outname = os.path.join(
+                        out_dir, fontname.replace(" ", "-") + "." + ext
+                    )
+                    outfile = open(outname, "wb")
+                    outfile.write(buffer)
+                    outfile.close()
+                    buffer = None
+        if args.images:
+            itemlist = doc.getPageImageList(pno - 1)
+            for item in itemlist:
+                xref = item[0]
+                if xref not in image_xrefs:
+                    image_xrefs.add(xref)
+                    pix = recoverpix(doc, item)
+                    if type(pix) is dict:
+                        ext = pix["ext"]
+                        imgdata = pix["image"]
+                        outname = os.path.join(out_dir, "img-%i.%s" % (xref, ext))
+                        outfile = open(outname, "wb")
+                        outfile.write(imgdata)
+                        outfile.close()
+                    else:
+                        outname = os.path.join(out_dir, "img-%i.png" % xref)
+                        pix2 = (
+                            pix
+                            if pix.colorspace.n < 4
+                            else fitz.Pixmap(fitz.csRGB, pix)
+                        )
+                        pix2.writeImage(outname)
+
+    if args.fonts:
+        print("saved %i fonts to '%s'" % (len(font_xrefs), out_dir))
+    if args.images:
+        print("saved %i images to '%s'" % (len(image_xrefs), out_dir))
+    doc.close()
+
+
+def main():
+    """Define command configurations.
+    """
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description=mycenter("Basic PyMuPDF Functions"), prog="fitz"
+    )
+    subps = parser.add_subparsers(
+        title="Subcommands", help="Enter 'command -h' for subcommand specific help"
+    )
+
+    # -------------------------------------------------------------------------
+    # 'show' command
+    # -------------------------------------------------------------------------
+    ps_show = subps.add_parser("show", description=mycenter("display PDF information"))
+    ps_show.add_argument("input", type=str, help="PDF filename")
+    ps_show.add_argument("-password", help="password")
+    ps_show.add_argument("-catalog", action="store_true", help="show PDF catalog")
+    ps_show.add_argument("-trailer", action="store_true", help="show PDF trailer")
+    ps_show.add_argument("-metadata", action="store_true", help="show PDF metadata")
+    ps_show.add_argument(
+        "-xrefs", type=str, help="show selected objects, format: 1,5-7,N"
+    )
+    ps_show.add_argument(
+        "-pages", type=str, help="show selected pages, format: 1,5-7,50-N"
+    )
+    ps_show.set_defaults(func=show)
+
+    # -------------------------------------------------------------------------
+    # 'clean' command
+    # -------------------------------------------------------------------------
+    ps_clean = subps.add_parser(
+        "clean", description=mycenter("optimize PDF or create sub-PDF if pages given")
+    )
+    ps_clean.add_argument("input", type=str, help="PDF filename")
+    ps_clean.add_argument("output", type=str, help="output PDF filename")
+    ps_clean.add_argument("-password", help="password")
+
+    ps_clean.add_argument(
+        "-encryption",
+        help="encryption method",
+        choices=("keep", "none", "rc4-40", "rc4-128", "aes-128", "aes-256"),
+        default="none",
+    )
+
+    ps_clean.add_argument("-owner", type=str, help="owner password")
+    ps_clean.add_argument("-user", type=str, help="user password")
+
+    ps_clean.add_argument(
+        "-garbage",
+        type=int,
+        help="garbage collection level",
+        choices=range(5),
+        default=0,
+    )
+
+    ps_clean.add_argument(
+        "-compress",
+        action="store_true",
+        default=False,
+        help="compress (deflate) output",
+    )
+
+    ps_clean.add_argument(
+        "-ascii", action="store_true", default=False, help="ASCII encode binary data"
+    )
+
+    ps_clean.add_argument(
+        "-linear",
+        action="store_true",
+        default=False,
+        help="format for fast web display",
+    )
+
+    ps_clean.add_argument(
+        "-permission", type=int, default=-1, help="integer with permission levels"
+    )
+
+    ps_clean.add_argument(
+        "-sanitize",
+        action="store_true",
+        default=False,
+        help="sanitize / clean contents",
+    )
+    ps_clean.add_argument(
+        "-pretty", action="store_true", default=False, help="prettify PDF structure"
+    )
+    ps_clean.add_argument(
+        "-pages", help="output selected pages pages, format: 1,5-7,50-N"
+    )
+    ps_clean.set_defaults(func=clean)
+
+    # -------------------------------------------------------------------------
+    # 'join' command
+    # -------------------------------------------------------------------------
+    ps_join = subps.add_parser(
+        "join",
+        description=mycenter("join PDF documents"),
+        epilog="specify each input as 'filename[,password[,pages]]'",
+    )
+    ps_join.add_argument("input", nargs="*", help="input filenames")
+    ps_join.add_argument("-output", required=True, help="output filename")
+    ps_join.set_defaults(func=doc_join)
+
+    # -------------------------------------------------------------------------
+    # 'extract' command
+    # -------------------------------------------------------------------------
+    ps_extract = subps.add_parser(
+        "extract", description=mycenter("extract images and fonts to disk")
+    )
+    ps_extract.add_argument("input", type=str, help="PDF filename")
+    ps_extract.add_argument("-images", action="store_true", help="extract images")
+    ps_extract.add_argument("-fonts", action="store_true", help="extract fonts")
+    ps_extract.add_argument(
+        "-output", help="folder to receive output, defaults to current"
+    )
+    ps_extract.add_argument("-password", help="password")
+    ps_extract.add_argument(
+        "-pages", type=str, help="consider these pages only, format: 1,5-7,50-N"
+    )
+    ps_extract.set_defaults(func=extract_objects)
+
+    # -------------------------------------------------------------------------
+    # 'embed-info'
+    # -------------------------------------------------------------------------
+    ps_show = subps.add_parser(
+        "embed-info", description=mycenter("list embedded files")
+    )
+    ps_show.add_argument("input", help="PDF filename")
+    ps_show.add_argument("-name", help="if given, report only this one")
+    ps_show.add_argument("-detail", action="store_true", help="detail information")
+    ps_show.add_argument("-password", help="password")
+    ps_show.set_defaults(func=embedded_list)
+
+    # -------------------------------------------------------------------------
+    # 'embed-add' command
+    # -------------------------------------------------------------------------
+    ps_embed_add = subps.add_parser(
+        "embed-add", description=mycenter("add embedded file")
+    )
+    ps_embed_add.add_argument("input", help="PDF filename")
+    ps_embed_add.add_argument("-password", help="password")
+    ps_embed_add.add_argument(
+        "-output", help="output PDF filename, incremental save if none"
+    )
+    ps_embed_add.add_argument("-name", required=True, help="name of new entry")
+    ps_embed_add.add_argument("-path", required=True, help="path to data for new entry")
+    ps_embed_add.add_argument("-desc", help="description of new entry")
+    ps_embed_add.set_defaults(func=embedded_add)
+
+    # -------------------------------------------------------------------------
+    # 'embed-del' command
+    # -------------------------------------------------------------------------
+    ps_embed_del = subps.add_parser(
+        "embed-del", description=mycenter("delete embedded file")
+    )
+    ps_embed_del.add_argument("input", help="PDF filename")
+    ps_embed_del.add_argument("-password", help="password")
+    ps_embed_del.add_argument(
+        "-output", help="output PDF filename, incremental save if none"
+    )
+    ps_embed_del.add_argument("-name", required=True, help="name of entry to delete")
+    ps_embed_del.set_defaults(func=embedded_del)
+
+    # -------------------------------------------------------------------------
+    # 'embed-upd' command
+    # -------------------------------------------------------------------------
+    ps_embed_upd = subps.add_parser(
+        "embed-upd",
+        description=mycenter("update embedded file"),
+        epilog="except '-name' all parameters are optional",
+    )
+    ps_embed_upd.add_argument("input", help="PDF filename")
+    ps_embed_upd.add_argument("-name", required=True, help="name of entry")
+    ps_embed_upd.add_argument("-password", help="password")
+    ps_embed_upd.add_argument(
+        "-output", help="Output PDF filename, incremental save if none"
+    )
+    ps_embed_upd.add_argument("-path", help="path to new data for entry")
+    ps_embed_upd.add_argument("-filename", help="new filename to store in entry")
+    ps_embed_upd.add_argument(
+        "-ufilename", help="new unicode filename to store in entry"
+    )
+    ps_embed_upd.add_argument("-desc", help="new description to store in entry")
+    ps_embed_upd.set_defaults(func=embedded_upd)
+
+    # -------------------------------------------------------------------------
+    # 'embed-extract' command
+    # -------------------------------------------------------------------------
+    ps_embed_extract = subps.add_parser(
+        "embed-extract", description=mycenter("extract embedded file to disk")
+    )
+    ps_embed_extract.add_argument("input", type=str, help="PDF filename")
+    ps_embed_extract.add_argument("-name", required=True, help="name of entry")
+    ps_embed_extract.add_argument("-password", help="password")
+    ps_embed_extract.add_argument(
+        "-output", help="output filename, default is stored name"
+    )
+    ps_embed_extract.set_defaults(func=embedded_get)
+
+    # -------------------------------------------------------------------------
+    # 'embed-copy' command
+    # -------------------------------------------------------------------------
+    ps_embed_copy = subps.add_parser(
+        "embed-copy", description=mycenter("copy embedded files between PDFs")
+    )
+    ps_embed_copy.add_argument("input", type=str, help="PDF to receive embedded files")
+    ps_embed_copy.add_argument("-password", help="password of input")
+    ps_embed_copy.add_argument(
+        "-output", help="output PDF, incremental save to 'input' if omitted"
+    )
+    ps_embed_copy.add_argument(
+        "-source", required=True, help="copy embedded files from here"
+    )
+    ps_embed_copy.add_argument("-pwdsource", help="password of 'source' PDF")
+    ps_embed_copy.add_argument(
+        "-name", nargs="*", help="restrict copy to these entries"
+    )
+    ps_embed_copy.set_defaults(func=embedded_copy)
+
+    # -------------------------------------------------------------------------
+    # start program
+    # -------------------------------------------------------------------------
+    args = parser.parse_args()  # create parameter arguments class
+    if not hasattr(args, "func"):  # no function selected
+        parser.print_help()  # so print top level help
+    else:
+        args.func(args)  # execute requested command
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fitz/fitz.i b/fitz/fitz.i

new file mode 100644 (file)

index 0000000..6c6c66d
--- /dev/null
+++ b/fitz/fitz.i
@@ -0,0 +1,9443 @@
+%module fitz
+%pythonbegin %{
+from __future__ import division, print_function
+%}
+//-----------------------------------------------------------------------------
+// SWIG macro: generate fitz exceptions
+//-----------------------------------------------------------------------------
+%define FITZEXCEPTION(meth, cond)
+%exception meth
+{
+    $action
+    if (cond) {PyErr_SetString(PyExc_RuntimeError, fz_caught_message(gctx));
+        return NULL;}
+}
+%enddef
+
+//-----------------------------------------------------------------------------
+// SWIG macro: check that a document is not closed / encrypted
+//-----------------------------------------------------------------------------
+%define CLOSECHECK(meth, doc)
+%pythonprepend meth %{doc
+if self.isClosed or self.isEncrypted:
+    raise ValueError("document closed or encrypted")%}
+%enddef
+
+%define CLOSECHECK0(meth, doc)
+%pythonprepend meth%{doc
+if self.isClosed:
+    raise ValueError("document closed")%}
+%enddef
+
+//-----------------------------------------------------------------------------
+// SWIG macro: check if object has a valid parent
+//-----------------------------------------------------------------------------
+%define PARENTCHECK(meth, doc)
+%pythonprepend meth %{doc
+CheckParent(self)%}
+%enddef
+
+
+%{
+#define MEMDEBUG 0
+#if MEMDEBUG == 1
+    #define DEBUGMSG1(x) PySys_WriteStderr("[DEBUG] free %s ", x)
+    #define DEBUGMSG2 PySys_WriteStderr("... done!\n")
+#else
+    #define DEBUGMSG1(x)
+    #define DEBUGMSG2
+#endif
+
+#ifndef FLT_EPSILON
+  #define FLT_EPSILON 1e-5
+#endif
+
+#define return_none Py_RETURN_NONE
+#define SWIG_FILE_WITH_INIT
+#define SWIG_PYTHON_2_UNICODE
+
+// memory allocation macros
+#define JM_MEMORY 1
+#if  PY_VERSION_HEX < 0x03000000
+    #undef JM_MEMORY
+    #define JM_MEMORY 0
+#endif
+
+#if JM_MEMORY == 1
+    #define JM_Alloc(type, len) PyMem_New(type, len)
+    #define JM_Free(x) PyMem_Del(x)
+#else
+    #define JM_Alloc(type, len) (type *) malloc(sizeof(type)*len)
+    #define JM_Free(x) free(x)
+#endif
+
+#define EXISTS(x) (x != NULL && PyObject_IsTrue(x)==1)
+#define THROWMSG(msg) fz_throw(gctx, FZ_ERROR_GENERIC, msg)
+#define ASSERT_PDF(cond) if (cond == NULL) fz_throw(gctx, FZ_ERROR_GENERIC, "not a PDF")
+#define INRANGE(v, low, high) ((low) <= v && v <= (high))
+#define MAX(a, b) ((a) < (b)) ? (b) : (a)
+#define MIN(a, b) ((a) < (b)) ? (a) : (b)
+
+#define JM_PyErr_Clear if (PyErr_Occurred()) PyErr_Clear()
+
+// binary output depends on Python major
+# if PY_VERSION_HEX >= 0x03000000
+    #define JM_BinFromChar(x) PyBytes_FromString(x)
+    #define JM_BinFromCharSize(x, y) PyBytes_FromStringAndSize(x, (Py_ssize_t) y)
+# else
+    #define JM_BinFromChar(x) PyByteArray_FromStringAndSize(x, (Py_ssize_t) strlen(x))
+    #define JM_BinFromCharSize(x, y) PyByteArray_FromStringAndSize(x, (Py_ssize_t) y)
+# endif
+
+#include <fitz.h>
+#include <pdf.h>
+#include <time.h>
+char *JM_Python_str_AsChar(PyObject *str);
+
+// additional headers from MuPDF ----------------------------------------------
+pdf_obj *pdf_lookup_page_loc(fz_context *ctx, pdf_document *doc, int needle, pdf_obj **parentp, int *indexp);
+fz_pixmap *fz_scale_pixmap(fz_context *ctx, fz_pixmap *src, float x, float y, float w, float h, const fz_irect *clip);
+int fz_pixmap_size(fz_context *ctx, fz_pixmap *src);
+void fz_subsample_pixmap(fz_context *ctx, fz_pixmap *tile, int factor);
+void fz_copy_pixmap_rect(fz_context *ctx, fz_pixmap *dest, fz_pixmap *src, fz_irect b, const fz_default_colorspaces *default_cs);
+// end of additional MuPDF headers --------------------------------------------
+
+PyObject *JM_mupdf_warnings_store;
+PyObject *JM_mupdf_show_errors;
+%}
+
+//-----------------------------------------------------------------------------
+// global context
+//-----------------------------------------------------------------------------
+%init %{
+#if JM_MEMORY == 1
+    gctx = fz_new_context(&JM_Alloc_Context, NULL, FZ_STORE_DEFAULT);
+#else
+    gctx = fz_new_context(NULL, NULL, FZ_STORE_DEFAULT);
+#endif
+    if(!gctx)
+    {
+        PyErr_SetString(PyExc_RuntimeError, "Fatal error: could not create global context.");
+# if PY_VERSION_HEX >= 0x03000000
+       return NULL;
+# else
+       return;
+# endif
+    }
+    fz_register_document_handlers(gctx);
+
+//-----------------------------------------------------------------------------
+// START redirect stdout/stderr
+//-----------------------------------------------------------------------------
+JM_mupdf_warnings_store = PyList_New(0);
+JM_mupdf_show_errors = Py_True;
+char user[] = "PyMuPDF";
+fz_set_warning_callback(gctx, JM_mupdf_warning, &user);
+fz_set_error_callback(gctx, JM_mupdf_error, &user);
+//-----------------------------------------------------------------------------
+// STOP redirect stdout/stderr
+//-----------------------------------------------------------------------------
+// init global constants
+//-----------------------------------------------------------------------------
+dictkey_align = PyString_InternFromString("align");
+dictkey_bbox = PyString_InternFromString("bbox");
+dictkey_blocks = PyString_InternFromString("blocks");
+dictkey_bpc = PyString_InternFromString("bpc");
+dictkey_c = PyString_InternFromString("c");
+dictkey_chars = PyString_InternFromString("chars");
+dictkey_color = PyString_InternFromString("color");
+dictkey_colorspace = PyString_InternFromString("colorspace");
+dictkey_content = PyString_InternFromString("content");
+dictkey_creationDate = PyString_InternFromString("creationDate");
+dictkey_cs_name = PyString_InternFromString("cs-name");
+dictkey_da = PyString_InternFromString("da");
+dictkey_dashes = PyString_InternFromString("dashes");
+dictkey_desc = PyString_InternFromString("desc");
+dictkey_dir = PyString_InternFromString("dir");
+dictkey_effect = PyString_InternFromString("effect");
+dictkey_ext = PyString_InternFromString("ext");
+dictkey_filename = PyString_InternFromString("filename");
+dictkey_fill = PyString_InternFromString("fill");
+dictkey_flags = PyString_InternFromString("flags");
+dictkey_font = PyString_InternFromString("font");
+dictkey_height = PyString_InternFromString("height");
+dictkey_id = PyString_InternFromString("id");
+dictkey_image = PyString_InternFromString("image");
+dictkey_length = PyString_InternFromString("length");
+dictkey_lines = PyString_InternFromString("lines");
+dictkey_modDate = PyString_InternFromString("modDate");
+dictkey_name = PyString_InternFromString("name");
+dictkey_origin = PyString_InternFromString("origin");
+dictkey_size = PyString_InternFromString("size");
+dictkey_smask = PyString_InternFromString("smask");
+dictkey_spans = PyString_InternFromString("spans");
+dictkey_stroke = PyString_InternFromString("stroke");
+dictkey_style = PyString_InternFromString("style");
+dictkey_subject = PyString_InternFromString("subject");
+dictkey_text = PyString_InternFromString("text");
+dictkey_title = PyString_InternFromString("title");
+dictkey_type = PyString_InternFromString("type");
+dictkey_ufilename = PyString_InternFromString("ufilename");
+dictkey_width = PyString_InternFromString("width");
+dictkey_wmode = PyString_InternFromString("wmode");
+dictkey_xref = PyString_InternFromString("xref");
+dictkey_xres = PyString_InternFromString("xres");
+dictkey_yres = PyString_InternFromString("yres");
+%}
+
+%header %{
+fz_context *gctx;
+static int JM_UNIQUE_ID = 0;
+
+struct DeviceWrapper {
+    fz_device *device;
+    fz_display_list *list;
+};
+%}
+
+//-----------------------------------------------------------------------------
+// include version information and several other helpers
+//-----------------------------------------------------------------------------
+%pythoncode %{
+import io
+import math
+import os
+import weakref
+from binascii import hexlify
+
+fitz_py2 = str is bytes  # if true, this is Python 2
+string_types = (str, unicode) if fitz_py2 else (str,)
+%}
+%include version.i
+%include helper-defines.i
+%include helper-geo-c.i
+%include helper-other.i
+%include helper-pixmap.i
+%include helper-geo-py.i
+%include helper-annot.i
+%include helper-stext.i
+%include helper-fields.i
+%include helper-python.i
+%include helper-portfolio.i
+%include helper-select.i
+%include helper-xobject.i
+%include helper-pdfinfo.i
+%include helper-convert.i
+
+//-----------------------------------------------------------------------------
+// fz_document
+//-----------------------------------------------------------------------------
+struct Document
+{
+    %extend
+    {
+        ~Document()
+        {
+            DEBUGMSG1("Document w/o close");
+            fz_document *this_doc = (fz_document *) $self;
+            fz_drop_document(gctx, this_doc);
+            DEBUGMSG2;
+        }
+        FITZEXCEPTION(Document, !result)
+
+        %pythonprepend Document %{
+        """Creates a document. Use 'open' as a synonym.
+
+        Notes:
+            Basic usages:
+            open() - creates new empty PDF document
+            open(filename) - string or pathlib.Path, must have supported
+                    file extension.
+            open(type, buffer) - type: valid extension, buffer: bytes object.
+            open(stream=buffer, filetype=type) - keyword version of previous.
+            open(filename, fileype=type) - filename with unrecognized extension.
+
+            rect, width, height, fontsize may be used to re-layout reflowable documents
+            on open (e.g. EPUB). Ignored if not applicable.
+        """
+
+        if not filename or type(filename) is str:
+            pass
+        else:
+            if fitz_py2:  # Python 2
+                if type(filename) is unicode:
+                    filename = filename.encode("utf8")
+            else:
+                filename = str(filename)  # takes care of pathlib.Path
+
+        if stream:
+            if not (filename or filetype):
+                raise ValueError("need filetype for opening a stream")
+
+            if type(stream) is bytes:
+                self.stream = stream
+            elif type(stream) is bytearray:
+                self.stream = bytes(stream)
+            elif type(stream) is io.BytesIO:
+                self.stream = stream.getvalue()
+            else:
+                raise ValueError("bad type: 'stream'")
+            stream = self.stream
+        else:
+            self.stream = None
+
+        if filename and not stream:
+            self.name = filename
+        else:
+            self.name = ""
+
+        self.isClosed    = False
+        self.isEncrypted = False
+        self.metadata    = None
+        self.FontInfos   = []
+        self.Graftmaps   = {}
+        self.ShownPages  = {}
+        self._page_refs  = weakref.WeakValueDictionary()%}
+
+        %pythonappend Document %{
+            if self.thisown:
+                self._graft_id = TOOLS.gen_id()
+                if self.needsPass is True:
+                    self.isEncrypted = True
+                else: # we won't init until doc is decrypted
+                    self.initData()
+        %}
+
+        Document(const char *filename=NULL, PyObject *stream=NULL,
+                      const char *filetype=NULL, PyObject *rect=NULL,
+                      float width=0, float height=0,
+                      float fontsize=11)
+        {
+            gctx->error.errcode = 0;       // reset any error code
+            gctx->error.message[0] = 0;    // reset any error message
+            fz_document *doc = NULL;
+            char *c = NULL;
+            size_t len = 0;
+            fz_stream *data = NULL;
+            float w = width, h = height;
+            fz_rect r = JM_rect_from_py(rect);
+            if (!fz_is_infinite_rect(r)) {
+                w = r.x1 - r.x0;
+                h = r.y1 - r.y0;
+            }
+
+            fz_try(gctx) {
+                if (stream != Py_None) { // stream given, **MUST** be bytes!
+
+                    c = PyBytes_AS_STRING(stream); // just a pointer, no new obj
+                    len = (size_t) PyBytes_Size(stream);
+                    data = fz_open_memory(gctx, (const unsigned char *) c, len);
+                    char *magic = (char *)filename;
+                    if (!magic) magic = (char *)filetype;
+                    doc = fz_open_document_with_stream(gctx, magic, data);
+                } else {
+                    if (filename) {
+                        if (!filetype || strlen(filetype) == 0) {
+                            doc = fz_open_document(gctx, filename);
+                        } else {
+                            const fz_document_handler *handler;
+                            handler = fz_recognize_document(gctx, filetype);
+                            if (handler && handler->open)
+                                doc = handler->open(gctx, filename);
+                            else THROWMSG("unrecognized file type");
+                        }
+                    } else {
+                        pdf_document *pdf = pdf_create_document(gctx);
+                        pdf->dirty = 1;
+                        doc = (fz_document *) pdf;
+                    }
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            if (w > 0 && h > 0) {
+                fz_layout_document(gctx, doc, w, h, fontsize);
+            }
+            return (struct Document *) doc;
+        }
+
+        %pythonprepend close %{
+            """Close document."""
+            if self.isClosed:
+                raise ValueError("document closed")
+            if hasattr(self, "_outline") and self._outline:
+                self._dropOutline(self._outline)
+                self._outline = None
+            self._reset_page_refs()
+            self.metadata    = None
+            self.stream      = None
+            self.isClosed    = True
+            self.FontInfos   = []
+            for gmap in self.Graftmaps:
+                self.Graftmaps[gmap] = None
+            self.Graftmaps = {}
+            self.ShownPages = {}
+        %}
+
+        %pythonappend close %{self.thisown = False%}
+        void close()
+        {
+            DEBUGMSG1("Document after close");
+            fz_document *doc = (fz_document *) $self;
+            while(doc->refs > 1) {
+                fz_drop_document(gctx, doc);
+            }
+            fz_drop_document(gctx, doc);
+            DEBUGMSG2;
+        }
+
+        FITZEXCEPTION(loadPage, !result)
+        %pythonprepend loadPage %{
+        """Load a page.
+
+        'page_id' is either a 0-based page number or a tuple (chapter, pno),
+        with chapter number and page number within that chapter.
+        """
+
+        if self.isClosed or self.isEncrypted:
+            raise ValueError("document closed or encrypted")
+        if page_id is None:
+            page_id = 0
+        if page_id not in self:
+            raise ValueError("page not in document")
+        if type(page_id) is int and page_id < 0:
+            np = self.pageCount
+            while page_id < 0:
+                page_id += np
+        %}
+        %pythonappend loadPage %{
+        val.thisown = True
+        val.parent = weakref.proxy(self)
+        self._page_refs[id(val)] = val
+        val._annot_refs = weakref.WeakValueDictionary()
+        val.number = page_id
+        %}
+        struct Page *
+        loadPage(PyObject *page_id)
+        {
+            fz_page *page = NULL;
+            fz_document *doc = (fz_document *) $self;
+            int pno;
+            PyObject *val = NULL;
+            fz_try(gctx) {
+                if (PySequence_Check(page_id)) {
+                    val = PySequence_GetItem(page_id, 0);
+                    if (!val) THROWMSG("bad page page id");
+                    int chapter = (int) PyLong_AsLong(val);
+                    Py_DECREF(val);
+                    if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                    val = PySequence_GetItem(page_id, 1);
+                    if (!val) THROWMSG("bad page page id");
+                    pno = (int) PyLong_AsLong(val);
+                    Py_DECREF(val);
+                    if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                    page = fz_load_chapter_page(gctx, doc, chapter, pno);
+                } else {
+                    pno = (int) PyLong_AsLong(page_id);
+                    if (PyErr_Occurred()) THROWMSG("bad page id");
+                    page = fz_load_page(gctx, doc, pno);
+                }
+            }
+            fz_catch(gctx) {
+                PyErr_Clear();
+                return NULL;
+            }
+            PyErr_Clear();
+            return (struct Page *) page;
+        }
+
+
+        FITZEXCEPTION(_remove_links_to, !result)
+        PyObject *_remove_links_to(int first, int last)
+        {
+            fz_try(gctx) {
+                fz_document *doc = (fz_document *) $self;
+                pdf_document *pdf = pdf_specifics(gctx, doc);
+                pdf_drop_page_tree(gctx, pdf);
+                pdf_load_page_tree(gctx, pdf);
+                remove_dest_range(gctx, pdf, first, last);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        CLOSECHECK0(_loadOutline, """Load first outline.""")
+        struct Outline *_loadOutline()
+        {
+            fz_outline *ol = NULL;
+            fz_document *doc = (fz_document *) $self;
+            fz_try(gctx) {
+                ol = fz_load_outline(gctx, doc);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Outline *) ol;
+        }
+
+        void _dropOutline(struct Outline *ol) {
+            DEBUGMSG1("Outline");
+            fz_outline *this_ol = (fz_outline *) ol;
+            fz_drop_outline(gctx, this_ol);
+            DEBUGMSG2;
+        }
+
+        //---------------------------------------------------------------------
+        // EmbeddedFiles utility functions
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_embeddedFileNames, !result)
+        CLOSECHECK0(_embeddedFileNames, """Get list of embedded file names.""")
+        PyObject *_embeddedFileNames(PyObject *namelist)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_specifics(gctx, doc);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                PyObject *val;
+                pdf_obj *names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf),
+                                      PDF_NAME(Root),
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+                if (pdf_is_array(gctx, names)) {
+                    int i, n = pdf_array_len(gctx, names);
+                    for (i=0; i < n; i+=2) {
+                        val = JM_EscapeStrFromStr(pdf_to_text_string(gctx,
+                                         pdf_array_get(gctx, names, i)));
+                        LIST_APPEND_DROP(namelist, val);
+                    }
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        FITZEXCEPTION(_embeddedFileDel, !result)
+        PyObject *_embeddedFileDel(int idx)
+        {
+            fz_try(gctx) {
+                fz_document *doc = (fz_document *) $self;
+                pdf_document *pdf = pdf_document_from_fz_document(gctx, doc);
+                pdf_obj *names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf),
+                                      PDF_NAME(Root),
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+                pdf_array_delete(gctx, names, idx + 1);
+                pdf_array_delete(gctx, names, idx);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        FITZEXCEPTION(_embeddedFileInfo, !result)
+        PyObject *_embeddedFileInfo(int idx, PyObject *infodict)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_document_from_fz_document(gctx, doc);
+            char *name;
+            fz_try(gctx) {
+                pdf_obj *names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf),
+                                      PDF_NAME(Root),
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+
+                pdf_obj *o = pdf_array_get(gctx, names, 2*idx+1);
+
+                name = (char *) pdf_to_text_string(gctx,
+                                          pdf_dict_get(gctx, o, PDF_NAME(F)));
+                DICT_SETITEM_DROP(infodict, dictkey_filename, JM_EscapeStrFromStr(name));
+
+                name = (char *) pdf_to_text_string(gctx,
+                                    pdf_dict_get(gctx, o, PDF_NAME(UF)));
+                DICT_SETITEM_DROP(infodict, dictkey_ufilename, JM_EscapeStrFromStr(name));
+
+                name = (char *) pdf_to_text_string(gctx,
+                                    pdf_dict_get(gctx, o, PDF_NAME(Desc)));
+                DICT_SETITEM_DROP(infodict, dictkey_desc, JM_UnicodeFromStr(name));
+
+                int len = -1, DL = -1;
+                pdf_obj *ef = pdf_dict_get(gctx, o, PDF_NAME(EF));
+                o = pdf_dict_getl(gctx, ef, PDF_NAME(F),
+                                            PDF_NAME(Length), NULL);
+                if (o) len = pdf_to_int(gctx, o);
+
+                o = pdf_dict_getl(gctx, ef, PDF_NAME(F), PDF_NAME(DL), NULL);
+                if (o) {
+                    DL = pdf_to_int(gctx, o);
+                } else {
+                    o = pdf_dict_getl(gctx, ef, PDF_NAME(F), PDF_NAME(Params),
+                                   PDF_NAME(Size), NULL);
+                    if (o) DL = pdf_to_int(gctx, o);
+                }
+                DICT_SETITEM_DROP(infodict, dictkey_size, Py_BuildValue("i", DL));
+                DICT_SETITEM_DROP(infodict, dictkey_length, Py_BuildValue("i", len));
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        FITZEXCEPTION(_embeddedFileUpd, !result)
+        PyObject *_embeddedFileUpd(int idx, PyObject *buffer = NULL, char *filename = NULL, char *ufilename = NULL, char *desc = NULL)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_document_from_fz_document(gctx, doc);
+            fz_buffer *res = NULL;
+            fz_var(res);
+            fz_try(gctx) {
+                pdf_obj *names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf),
+                                      PDF_NAME(Root),
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+
+                pdf_obj *entry = pdf_array_get(gctx, names, 2*idx+1);
+
+                pdf_obj *filespec = pdf_dict_getl(gctx, entry, PDF_NAME(EF),
+                                                  PDF_NAME(F), NULL);
+                if (!filespec) THROWMSG("bad PDF: /EF object not found");
+
+                res = JM_BufferFromBytes(gctx, buffer);
+                if (EXISTS(buffer) && !res) THROWMSG("bad type: 'buffer'");
+                if (res)
+                {
+                    JM_update_stream(gctx, pdf, filespec, res, 1);
+                    // adjust /DL and /Size parameters
+                    int64_t len = (int64_t) fz_buffer_storage(gctx, res, NULL);
+                    pdf_obj *l = pdf_new_int(gctx, len);
+                    pdf_dict_put(gctx, filespec, PDF_NAME(DL), l);
+                    pdf_dict_putl(gctx, filespec, l, PDF_NAME(Params), PDF_NAME(Size), NULL);
+                }
+
+                if (filename)
+                    pdf_dict_put_text_string(gctx, entry, PDF_NAME(F), filename);
+
+                if (ufilename)
+                    pdf_dict_put_text_string(gctx, entry, PDF_NAME(UF), ufilename);
+
+                if (desc)
+                    pdf_dict_put_text_string(gctx, entry, PDF_NAME(Desc), desc);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx)
+                return NULL;
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        FITZEXCEPTION(_embeddedFileGet, !result)
+        PyObject *_embeddedFileGet(int idx)
+        {
+            fz_document *doc = (fz_document *) $self;
+            PyObject *cont = NULL;
+            pdf_document *pdf = pdf_document_from_fz_document(gctx, doc);
+            fz_buffer *buf = NULL;
+            fz_var(buf);
+            fz_try(gctx) {
+                pdf_obj *names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf),
+                                      PDF_NAME(Root),
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+
+                pdf_obj *entry = pdf_array_get(gctx, names, 2*idx+1);
+                pdf_obj *filespec = pdf_dict_getl(gctx, entry, PDF_NAME(EF),
+                                                  PDF_NAME(F), NULL);
+                buf = pdf_load_stream(gctx, filespec);
+                cont = JM_BinFromBuffer(gctx, buf);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, buf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return cont;
+        }
+
+        FITZEXCEPTION(_embeddedFileAdd, !result)
+        PyObject *_embeddedFileAdd(const char *name, PyObject *buffer, char *filename=NULL, char *ufilename=NULL, char *desc=NULL)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_document_from_fz_document(gctx, doc);
+            fz_buffer *data = NULL;
+            unsigned char *buffdata;
+            fz_var(data);
+            int entry = 0;
+            size_t size = 0;
+            pdf_obj *names = NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                data = JM_BufferFromBytes(gctx, buffer);
+                if (!data) THROWMSG("bad type: 'buffer'");
+                size = fz_buffer_storage(gctx, data, &buffdata);
+
+                names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf),
+                                      PDF_NAME(Root),
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+                if (!pdf_is_array(gctx, names)) {
+                    pdf_obj *root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf),
+                                                 PDF_NAME(Root));
+                    names = pdf_new_array(gctx, pdf, 6);  // an even number!
+                    pdf_dict_putl_drop(gctx, root, names,
+                                      PDF_NAME(Names),
+                                      PDF_NAME(EmbeddedFiles),
+                                      PDF_NAME(Names),
+                                      NULL);
+                }
+
+                pdf_obj *fileentry = JM_embed_file(gctx, pdf, data,
+                                                   filename,
+                                                   ufilename,
+                                                   desc, 1);
+                pdf_array_push(gctx, names, pdf_new_text_string(gctx, name));
+                pdf_array_push_drop(gctx, names, fileentry);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, data);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        %pythoncode %{
+        def embeddedFileNames(self):
+            """Get list of names of EmbeddedFiles."""
+            filenames = []
+            self._embeddedFileNames(filenames)
+            return filenames
+
+        def _embeddedFileIndex(self, item):
+            filenames = self.embeddedFileNames()
+            msg = "'%s' not in EmbeddedFiles array." % str(item)
+            if item in filenames:
+                idx = filenames.index(item)
+            elif item in range(len(filenames)):
+                idx = item
+            else:
+                raise ValueError(msg)
+            return idx
+
+        def embeddedFileCount(self):
+            """Get number of EmbeddedFiles."""
+            return len(self.embeddedFileNames())
+
+        def embeddedFileDel(self, item):
+            """Delete an entry from EmbeddedFiles.
+
+            Notes:
+                The argument must be name or index of an EmbeddedFiles item.
+                Physical deletion of data will happen on save to a new
+                file with appropriate garbage option.
+            Args:
+                item: (str/int) name or number of the entry.
+            Returns:
+                None
+            """
+            idx = self._embeddedFileIndex(item)
+            return self._embeddedFileDel(idx)
+
+        def embeddedFileInfo(self, item):
+            """Get information of an item in the EmbeddedFiles array.
+
+            Args:
+                item: number or name of item.
+            Returns:
+                Information dictionary.
+            """
+            idx = self._embeddedFileIndex(item)
+            infodict = {"name": self.embeddedFileNames()[idx]}
+            self._embeddedFileInfo(idx, infodict)
+            return infodict
+
+        def embeddedFileGet(self, item):
+            """Get the content of an item in the EmbeddedFiles array.
+
+            Args:
+                item: number or name of item.
+            Returns:
+                (bytes) The file content.
+            """
+            idx = self._embeddedFileIndex(item)
+            return self._embeddedFileGet(idx)
+
+        def embeddedFileUpd(self, item, buffer=None,
+                                  filename=None,
+                                  ufilename=None,
+                                  desc=None):
+            """Change an item of the EmbeddedFiles array.
+
+            Notes:
+                All parameter are optional. If all arguments are omitted, the
+                method is a no-op.
+            Args:
+                item: the number or the name of the item.
+                buffer: (binary data) the new file content.
+                filename: (str) the new file name.
+                ufilename: (unicode) the new filen ame.
+                desc: (str) the new description.
+            """
+            idx = self._embeddedFileIndex(item)
+            return self._embeddedFileUpd(idx, buffer=buffer,
+                                         filename=filename,
+                                         ufilename=ufilename,
+                                         desc=desc)
+
+        def embeddedFileAdd(self, name, buffer,
+                                  filename=None,
+                                  ufilename=None,
+                                  desc=None):
+            """Add an item to the EmbeddedFiles array.
+
+            Args:
+                name: the name of the new item.
+                buffer: (binary data) the file content.
+                filename: (str) the file name.
+                ufilename: (unicode) the filen ame.
+                desc: (str) the description.
+            """
+            filenames = self.embeddedFileNames()
+            msg = "Name '%s' already in EmbeddedFiles array." % str(name)
+            if name in filenames:
+                raise ValueError(msg)
+
+            if filename is None:
+                filename = name
+            if ufilename is None:
+                ufilename = unicode(filename, "utf8") if str is bytes else filename
+            if desc is None:
+                desc = name
+            return self._embeddedFileAdd(name, buffer=buffer,
+                                         filename=filename,
+                                         ufilename=ufilename,
+                                         desc=desc)
+        %}
+
+        FITZEXCEPTION(convertToPDF, !result)
+        CLOSECHECK(convertToPDF, """Convert document to a PDF, selecting page range and optional rotation. Output bytes object.""")
+        PyObject *convertToPDF(int from_page=0, int to_page=-1, int rotate=0)
+        {
+            PyObject *doc = NULL;
+            fz_document *fz_doc = (fz_document *) $self;
+            fz_try(gctx) {
+                int fp = from_page, tp = to_page, srcCount = fz_count_pages(gctx, fz_doc);
+                if (pdf_specifics(gctx, fz_doc))
+                    THROWMSG("use select+write or insertPDF for PDF docs instead");
+                if (fp < 0) fp = 0;
+                if (fp > srcCount - 1) fp = srcCount - 1;
+                if (tp < 0) tp = srcCount - 1;
+                if (tp > srcCount - 1) tp = srcCount - 1;
+                doc = JM_convert_to_pdf(gctx, fz_doc, fp, tp, rotate);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return doc;
+        }
+
+        CLOSECHECK0(pageCount, """Number of pages.""")
+        %pythoncode%{@property%}
+        PyObject *pageCount()
+        {
+            return Py_BuildValue("i", fz_count_pages(gctx, (fz_document *) $self));
+        }
+
+        CLOSECHECK0(chapterCount, """Number of chapters.""")
+        %pythoncode%{@property%}
+        PyObject *chapterCount()
+        {
+            return Py_BuildValue("i", fz_count_chapters(gctx, (fz_document *) $self));
+        }
+
+        FITZEXCEPTION(lastLocation, !result)
+        CLOSECHECK0(lastLocation, """Id (chapter, page) of last page.""")
+        %pythoncode%{@property%}
+        PyObject *lastLocation()
+        {
+            fz_document *this_doc = (fz_document *) $self;
+            fz_location last_loc;
+            fz_try(gctx) {
+                last_loc = fz_last_page(gctx, this_doc);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("ii", last_loc.chapter, last_loc.page);
+        }
+
+
+        FITZEXCEPTION(chapterPageCount, !result)
+        CLOSECHECK0(chapterPageCount, """Page count of chapter.""")
+        PyObject *chapterPageCount(int chapter)
+        {
+            int chapters = fz_count_chapters(gctx, (fz_document *) $self);
+            int pages = 0;
+            fz_try(gctx) {
+                if (chapter < 0 || chapter >= chapters)
+                    THROWMSG("bad chapter number");
+                pages = fz_count_chapter_pages(gctx, (fz_document *) $self, chapter);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("i", pages);
+        }
+
+        FITZEXCEPTION(previousLocation, !result)
+        %pythonprepend previousLocation %{
+        """Get (chapter, page) of previous page."""
+        if self.isClosed or self.isEncrypted:
+            raise ValueError("document closed or encrypted")
+        if type(page_id) is int:
+            page_id = (0, page_id)
+        if page_id not in self:
+            raise ValueError("page id not in document")
+        if page_id  == (0, 0):
+            return ()
+        %}
+        PyObject *previousLocation(PyObject *page_id)
+        {
+            fz_document *this_doc = (fz_document *) $self;
+            fz_location prev_loc, loc;
+            PyObject *val;
+            int pno;
+            fz_try(gctx) {
+                val = PySequence_GetItem(page_id, 0);
+                if (!val) THROWMSG("bad page id");
+                int chapter = (int) PyLong_AsLong(val);
+                Py_DECREF(val);
+                if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                val = PySequence_GetItem(page_id, 1);
+                if (!val) THROWMSG("bad page id");
+                pno = (int) PyLong_AsLong(val);
+                Py_DECREF(val);
+                if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                loc = fz_make_location(chapter, pno);
+                prev_loc = fz_previous_page(gctx, this_doc, loc);
+            }
+            fz_catch(gctx) {
+                PyErr_Clear();
+                return NULL;
+            }
+            return Py_BuildValue("ii", prev_loc.chapter, prev_loc.page);
+        }
+
+
+        FITZEXCEPTION(nextLocation, !result)
+        %pythonprepend nextLocation %{
+        """Get (chapter, page) of next page."""
+        if self.isClosed or self.isEncrypted:
+            raise ValueError("document closed or encrypted")
+        if type(page_id) is int:
+            page_id = (0, page_id)
+        if page_id not in self:
+            raise ValueError("page id not in document")
+        if tuple(page_id)  == self.lastLocation:
+            return ()
+        %}
+        PyObject *nextLocation(PyObject *page_id)
+        {
+            fz_document *this_doc = (fz_document *) $self;
+            fz_location next_loc, loc;
+            int page_n = -1;
+            PyObject *val;
+            int pno;
+            fz_try(gctx) {
+                val = PySequence_GetItem(page_id, 0);
+                if (!val) THROWMSG("bad page id");
+                int chapter = (int) PyLong_AsLong(val);
+                Py_DECREF(val);
+                if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                val = PySequence_GetItem(page_id, 1);
+                if (!val) THROWMSG("bad page id");
+                pno = (int) PyLong_AsLong(val);
+                Py_DECREF(val);
+                if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                loc = fz_make_location(chapter, pno);
+                next_loc = fz_next_page(gctx, this_doc, loc);
+            }
+            fz_catch(gctx) {
+                PyErr_Clear();
+                return NULL;
+            }
+            return Py_BuildValue("ii", next_loc.chapter, next_loc.page);
+        }
+
+
+        FITZEXCEPTION(location_from_page_number, !result)
+        CLOSECHECK0(location_from_page_number, """Convert pno to (chapter, page).""")
+        PyObject *location_from_page_number(int pno)
+        {
+            fz_document *this_doc = (fz_document *) $self;
+            fz_location loc = fz_make_location(-1, -1);
+            int pageCount = fz_count_pages(gctx, this_doc);
+            while (pno < 0) pno += pageCount;
+            fz_try(gctx) {
+                if (pno >= pageCount)
+                    THROWMSG("bad page number(s)");
+                loc = fz_location_from_page_number(gctx, this_doc, pno);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("ii", loc.chapter, loc.page);
+        }
+
+        FITZEXCEPTION(page_number_from_location, !result)
+        %pythonprepend page_number_from_location%{
+        """Convert (chapter, pno) to page number."""
+        if type(page_id) is int:
+            np = self.pageCount
+            while page_id < 0:
+                page_id += np
+            page_id = (0, page_id)
+        if page_id not in self:
+            raise ValueError("page id not in document")
+        %}
+        PyObject *page_number_from_location(PyObject *page_id)
+        {
+            fz_document *this_doc = (fz_document *) $self;
+            fz_location loc;
+            int page_n = -1;
+            PyObject *val;
+            int pno;
+            fz_try(gctx) {
+                val = PySequence_GetItem(page_id, 0);
+                if (!val) THROWMSG("bad page id");
+                int chapter = (int) PyLong_AsLong(val);
+                Py_DECREF(val);
+                if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                val = PySequence_GetItem(page_id, 1);
+                if (!val) THROWMSG("bad page id");
+                pno = (int) PyLong_AsLong(val);
+                Py_DECREF(val);
+                if (PyErr_Occurred()) THROWMSG("bad page id");
+
+                loc = fz_make_location(chapter, pno);
+                page_n = fz_page_number_from_location(gctx, this_doc, loc);
+            }
+            fz_catch(gctx) {
+                PyErr_Clear();
+                return NULL;
+            }
+            return Py_BuildValue("i", page_n);
+        }
+
+        CLOSECHECK0(_getMetadata, """Get metadata.""")
+        char *_getMetadata(const char *key)
+        {
+            fz_document *doc = (fz_document *) $self;
+            int vsize;
+            char *value;
+            vsize = fz_lookup_metadata(gctx, doc, key, NULL, 0)+1;
+            if(vsize > 1) {
+                value = JM_Alloc(char, vsize);
+                fz_lookup_metadata(gctx, doc, key, value, vsize);
+                return value;
+            }
+            else
+                return NULL;
+        }
+
+        CLOSECHECK0(needsPass, """Indicate password required.""")
+        %pythoncode%{@property%}
+        PyObject *needsPass() {
+            return JM_BOOL(fz_needs_password(gctx, (fz_document *) $self));
+        }
+
+        %pythoncode%{@property%}
+        CLOSECHECK0(language, """Document language.""")
+        PyObject *language()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return_none;
+            fz_text_language lang = pdf_document_language(gctx, pdf);
+            char buf[8];
+            if (lang == FZ_LANG_UNSET) return_none;
+            return Py_BuildValue("s", fz_string_from_text_language(buf, lang));
+        }
+
+        FITZEXCEPTION(setLanguage, !result)
+        PyObject *setLanguage(char *language=NULL)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                fz_text_language lang;
+                if (!language)
+                    lang = FZ_LANG_UNSET;
+                else
+                    lang = fz_text_language_from_string(language);
+                pdf_set_document_language(gctx, pdf, lang);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            Py_RETURN_TRUE;
+        }
+
+
+        %pythonprepend resolveLink %{
+        """Calculate internal link destination.
+
+        Args:
+            uri: (str) some Link.uri
+            chapters: (bool) whether to use (chapter, page) format
+        Returns:
+            (page_id, x, y) where x, y are point coordinates on the page.
+            page_id is either page number (if chapters=0), or (chapter, pno).
+        """
+        %}
+        PyObject *resolveLink(char *uri=NULL, int chapters=0)
+        {
+            if (!uri) {
+                if (chapters) return Py_BuildValue("(ii)ff", -1, -1, 0, 0);
+                return Py_BuildValue("iff", -1, 0, 0);
+            }
+            fz_document *this_doc = (fz_document *) $self;
+            float xp = 0, yp = 0;
+            fz_location loc = {0, 0};
+            fz_try(gctx) {
+                loc = fz_resolve_link(gctx, (fz_document *) $self, uri, &xp, &yp);
+            }
+            fz_catch(gctx) {
+                if (chapters) return Py_BuildValue("(ii)ff", -1, -1, 0, 0);
+                return Py_BuildValue("iff", -1, 0, 0);
+            }
+            if (chapters)
+                return Py_BuildValue("(ii)ff", loc.chapter, loc.page, xp, yp);
+            int pno = fz_page_number_from_location(gctx, this_doc, loc);
+            return Py_BuildValue("iff", pno, xp, yp);
+        }
+
+        FITZEXCEPTION(layout, !result)
+        CLOSECHECK(layout, """Re-layout a reflowable document.""")
+        %pythonappend layout %{
+            self._reset_page_refs()
+            self.initData()%}
+        PyObject *layout(PyObject *rect = NULL, float width = 0, float height = 0, float fontsize = 11)
+        {
+            fz_document *doc = (fz_document *) $self;
+            if (!fz_is_document_reflowable(gctx, doc)) return_none;
+            fz_try(gctx) {
+                float w = width, h = height;
+                fz_rect r = JM_rect_from_py(rect);
+                if (!fz_is_infinite_rect(r)) {
+                    w = r.x1 - r.x0;
+                    h = r.y1 - r.y0;
+                }
+                if (w <= 0.0f || h <= 0.0f)
+                        THROWMSG("invalid page size");
+                fz_layout_document(gctx, doc, w, h, fontsize);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        FITZEXCEPTION(makeBookmark, !result)
+        CLOSECHECK(makeBookmark, """Make a page pointer before layouting document.""")
+        PyObject *makeBookmark(PyObject *loc)
+        {
+            fz_document *doc = (fz_document *) $self;
+            fz_location location;
+            fz_bookmark mark;
+            fz_try(gctx) {
+                if (JM_INT_ITEM(loc, 0, &location.chapter) == 1)
+                    THROWMSG("Bad location");
+                if (JM_INT_ITEM(loc, 1, &location.page) == 1)
+                    THROWMSG("Bad location");
+                mark = fz_make_bookmark(gctx, doc, location);
+                if (!mark) THROWMSG("Bad location");
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return PyLong_FromVoidPtr((void *) mark);
+        }
+
+
+        FITZEXCEPTION(findBookmark, !result)
+        CLOSECHECK(findBookmark, """Find new location after layouting a document.""")
+        PyObject *findBookmark(PyObject *bm)
+        {
+            fz_document *doc = (fz_document *) $self;
+            fz_location location;
+            fz_try(gctx) {
+                intptr_t mark = (intptr_t) PyLong_AsVoidPtr(bm);
+                location = fz_lookup_bookmark(gctx, doc, mark);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("ii", location.chapter, location.page);
+        }
+
+
+        CLOSECHECK0(isReflowable, """Check if document is layoutable.""")
+        %pythoncode%{@property%}
+        PyObject *isReflowable()
+        {
+            return JM_BOOL(fz_is_document_reflowable(gctx, (fz_document *) $self));
+        }
+
+        FITZEXCEPTION(_deleteObject, !result)
+        CLOSECHECK0(_deleteObject, """Delete object.""")
+        PyObject *_deleteObject(int xref)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_specifics(gctx, doc);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                if (!INRANGE(xref, 1, pdf_xref_len(gctx, pdf)-1))
+                    THROWMSG("xref out of range");
+                pdf_delete_object(gctx, pdf, xref);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        CLOSECHECK0(_getPDFroot, """Get xref of PDF catalog.""")
+        PyObject *_getPDFroot()
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_specifics(gctx, doc);
+            int xref = 0;
+            if (!pdf) return Py_BuildValue("i", xref);
+            fz_try(gctx) {
+                pdf_obj *root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf),
+                                             PDF_NAME(Root));
+                xref = pdf_to_num(gctx, root);
+            }
+            fz_catch(gctx) {;}
+            return Py_BuildValue("i", xref);
+        }
+
+        CLOSECHECK0(_getPDFfileid, """Get PDF file id.""")
+        PyObject *_getPDFfileid()
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_specifics(gctx, doc);
+            if (!pdf) return_none;
+            PyObject *idlist = PyList_New(0);
+            fz_buffer *buffer = NULL;
+            unsigned char *hex;
+            pdf_obj *o;
+            int n, i, len;
+            PyObject *bytes;
+
+            fz_try(gctx) {
+                pdf_obj *identity = pdf_dict_get(gctx, pdf_trailer(gctx, pdf),
+                                             PDF_NAME(ID));
+                if (identity)
+                {
+                    n = pdf_array_len(gctx, identity);
+                    for (i = 0; i < n; i++)
+                    {
+                        o = pdf_array_get(gctx, identity, i);
+                        len = (int) pdf_to_str_len(gctx, o);
+                        buffer = fz_new_buffer(gctx, 2 * len);
+                        fz_buffer_storage(gctx, buffer, &hex);
+                        hexlify(len, (unsigned char *) pdf_to_text_string(gctx, o), hex);
+                        LIST_APPEND_DROP(idlist, JM_UnicodeFromStr(hex));
+                        Py_CLEAR(bytes);
+                        fz_drop_buffer(gctx, buffer);
+                        buffer = NULL;
+                    }
+                }
+            }
+            fz_catch(gctx) fz_drop_buffer(gctx, buffer);
+            return idlist;
+        }
+
+        CLOSECHECK0(isPDF, """Check if a PDF document.""")
+        %pythoncode%{@property%}
+        PyObject *isPDF()
+        {
+            if (pdf_specifics(gctx, (fz_document *) $self)) Py_RETURN_TRUE;
+            else Py_RETURN_FALSE;
+        }
+
+        CLOSECHECK0(_hasXrefStream, """Check if xref table is a stream.""")
+        %pythoncode%{@property%}
+        PyObject *_hasXrefStream()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) Py_RETURN_FALSE;
+            if (pdf->has_xref_streams) Py_RETURN_TRUE;
+            Py_RETURN_FALSE;
+        }
+
+        CLOSECHECK0(_hasXrefOldStyle, """Check if xref table is old style.""")
+        %pythoncode%{@property%}
+        PyObject *_hasXrefOldStyle()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) Py_RETURN_FALSE;
+            if (pdf->has_old_style_xrefs) Py_RETURN_TRUE;
+            Py_RETURN_FALSE;
+        }
+
+        CLOSECHECK0(isDirty, """True if PDF has unsaved changes.""")
+        %pythoncode%{@property%}
+        PyObject *isDirty()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) Py_RETURN_FALSE;
+            return JM_BOOL(pdf_has_unsaved_changes(gctx, pdf));
+        }
+
+        CLOSECHECK0(can_save_incrementally, """Check whether incremental saves are possible.""")
+        PyObject *can_save_incrementally()
+        {
+            pdf_document *pdf = pdf_document_from_fz_document(gctx, (fz_document *) $self);
+            if (!pdf) Py_RETURN_FALSE; // gracefully handle non-PDF
+            return JM_BOOL(pdf_can_be_saved_incrementally(gctx, pdf));
+        }
+
+        CLOSECHECK0(authenticate, """Decrypt document.""")
+        %pythonappend authenticate %{
+        if val:  # the doc is decrypted successfully and we init the outline
+            self.isEncrypted = False
+            self.initData()
+            self.thisown = True
+        %}
+        PyObject *authenticate(char *password)
+        {
+            return Py_BuildValue("i", fz_authenticate_password(gctx, (fz_document *) $self, (const char *) password));
+        }
+
+        //---------------------------------------------------------------------
+        // save PDF file
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(save, !result)
+        %pythonprepend save %{
+        """Save PDF to filename."""
+        if self.isClosed or self.isEncrypted:
+            raise ValueError("document closed or encrypted")
+        if type(filename) == str:
+            pass
+        elif str is bytes and type(filename) == unicode:
+            filename = filename.encode('utf8')
+        else:
+            filename = str(filename)
+        if filename == self.name and not incremental:
+            raise ValueError("save to original must be incremental")
+        if self.pageCount < 1:
+            raise ValueError("cannot save with zero pages")
+        if incremental:
+            if self.name != filename or self.stream:
+                raise ValueError("incremental needs original file")
+        %}
+
+        PyObject *save(char *filename, int garbage=0, int clean=0, int deflate=0, int incremental=0, int ascii=0, int expand=0, int linear=0, int pretty=0, int encryption=1, int permissions=-1, char *owner_pw=NULL, char *user_pw=NULL)
+        {
+            pdf_write_options opts = pdf_default_write_options;
+            opts.do_incremental     = incremental;
+            opts.do_ascii           = ascii;
+            opts.do_compress        = deflate;
+            opts.do_compress_images = deflate;
+            opts.do_compress_fonts  = deflate;
+            opts.do_decompress      = expand;
+            opts.do_garbage         = garbage;
+            opts.do_pretty          = pretty;
+            opts.do_linear          = linear;
+            opts.do_clean           = clean;
+            opts.do_sanitize        = clean;
+            opts.do_encrypt         = encryption;
+            opts.permissions        = permissions;
+            if (owner_pw)
+            {
+                memcpy(&opts.opwd_utf8, owner_pw, strlen(owner_pw)+1);
+            }
+
+            if (user_pw)
+            {
+                memcpy(&opts.upwd_utf8, user_pw, strlen(user_pw)+1);
+            }
+
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                JM_embedded_clean(gctx, pdf);
+                pdf_save_document(gctx, pdf, filename, &opts);
+                pdf->dirty = 0;
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // write document to memory
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(write, !result)
+        %pythonprepend write %{
+        """Write the PDF to a bytes object."""
+        if self.isClosed or self.isEncrypted:
+            raise ValueError("document closed or encrypted")
+        if self.pageCount < 1:
+            raise ValueError("cannot write with zero pages")%}
+
+        PyObject *write(int garbage=0, int clean=0, int deflate=0,
+                        int ascii=0, int expand=0, int linear=0, int pretty=0,
+                        int encryption=1,
+                        int permissions=-1,
+                        char *owner_pw=NULL,
+                        char *user_pw=NULL)
+        {
+            PyObject *r = NULL;
+            fz_output *out = NULL;
+            fz_buffer *res = NULL;
+            pdf_write_options opts = pdf_default_write_options;
+            opts.do_incremental     = 0;
+            opts.do_ascii           = ascii;
+            opts.do_compress        = deflate;
+            opts.do_compress_images = deflate;
+            opts.do_compress_fonts  = deflate;
+            opts.do_decompress      = expand;
+            opts.do_garbage         = garbage;
+            opts.do_linear          = linear;
+            opts.do_clean           = clean;
+            opts.do_sanitize        = clean;
+            opts.do_pretty          = pretty;
+            opts.do_encrypt         = encryption;
+            opts.permissions        = permissions;
+            if (owner_pw) {
+                memcpy(&opts.opwd_utf8, owner_pw, strlen(owner_pw)+1);
+            }
+
+            if (user_pw) {
+                memcpy(&opts.upwd_utf8, user_pw, strlen(user_pw)+1);
+            }
+
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_specifics(gctx, doc);
+            fz_var(out);
+            fz_var(r);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                if (pdf_count_pages(gctx, pdf) < 1)
+                    THROWMSG("cannot save with zero pages");
+                JM_embedded_clean(gctx, pdf);
+                res = fz_new_buffer(gctx, 8192);
+                out = fz_new_output_with_buffer(gctx, res);
+                pdf_write_document(gctx, pdf, out, &opts);
+                r = JM_BinFromBuffer(gctx, res);
+                pdf->dirty = 0;
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                fz_drop_output(gctx, out);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return r;
+        }
+
+        //---------------------------------------------------------------------
+        // Insert pages from a source PDF into this PDF.
+        // For reconstructing the links (_do_links method), we must save the
+        // insertion point (start_at) if it was specified as -1.
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(insertPDF, !result)
+        %pythonprepend insertPDF %{
+        """Insert a page range from another PDF.
+
+        Args:
+            docsrc: PDF to copy from. Must be different object, but may be same file.
+            from_page: (int) first page of source PDF to copy.
+            to_page: (int) last page of source PDF to copy.
+            start_at: (int) from_page will become this page number in target.
+            links: (int/bool) whether to also copy links
+            annots: (int/bool) whether to also copy annotations
+
+        Copy sequence will reversed if from_page > to_page."""
+
+        if self.isClosed or self.isEncrypted:
+            raise ValueError("document closed or encrypted")
+        if id(self) == id(docsrc):
+            raise ValueError("source and target PDF are the same object")
+        sa = start_at
+        if sa < 0:
+            sa = self.pageCount%}
+
+        %pythonappend insertPDF %{
+        self._reset_page_refs()
+        if links:
+            self._do_links(docsrc, from_page = from_page, to_page = to_page,
+                        start_at = sa)%}
+
+        PyObject *
+        insertPDF(struct Document *docsrc,
+            int from_page=-1,
+            int to_page=-1,
+            int start_at=-1,
+            int rotate=-1,
+            int links=1,
+            int annots=1)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdfout = pdf_specifics(gctx, doc);
+            pdf_document *pdfsrc = pdf_specifics(gctx, (fz_document *) docsrc);
+            int outCount = fz_count_pages(gctx, doc);
+            int srcCount = fz_count_pages(gctx, (fz_document *) docsrc);
+
+            // local copies of page numbers
+            int fp = from_page, tp = to_page, sa = start_at;
+
+            // normalize page numbers
+            fp = MAX(fp, 0);                // -1 = first page
+            fp = MIN(fp, srcCount - 1);     // but do not exceed last page
+
+            if (tp < 0) tp = srcCount - 1;  // -1 = last page
+            tp = MIN(tp, srcCount - 1);     // but do not exceed last page
+
+            if (sa < 0) sa = outCount;      // -1 = behind last page
+            sa = MIN(sa, outCount);         // but that is also the limit
+
+            fz_try(gctx) {
+                if (!pdfout || !pdfsrc) THROWMSG("source or target not a PDF");
+                JM_merge_range(gctx, pdfout, pdfsrc, fp, tp, sa, rotate, links, annots);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdfout->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Create and insert a new page (PDF)
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_newPage, !result)
+        CLOSECHECK(_newPage, """Make a new PDF page.""")
+        %pythonappend _newPage %{self._reset_page_refs()%}
+        PyObject *_newPage(int pno=-1, float width=595, float height=842)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_rect mediabox = fz_unit_rect;
+            mediabox.x1 = width;
+            mediabox.y1 = height;
+            pdf_obj *resources = NULL, *page_obj = NULL;
+            fz_buffer *contents = NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                if (pno < -1) THROWMSG("bad page number(s)");
+                // create /Resources and /Contents objects
+                resources = pdf_add_object_drop(gctx, pdf, pdf_new_dict(gctx, pdf, 1));
+                page_obj = pdf_add_page(gctx, pdf, mediabox, 0, resources, contents);
+                pdf_insert_page(gctx, pdf, pno, page_obj);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, contents);
+                pdf_drop_obj(gctx, page_obj);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Create sub-document to keep only selected pages.
+        // Parameter is a Python sequence of the wanted page numbers.
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(select, !result)
+        %pythonprepend select %{"""Build sub-pdf with page numbers in the list."""
+if self.isClosed or self.isEncrypted:
+    raise ValueError("document closed or encrypted")
+if not self.isPDF:
+    raise ValueError("not a PDF")
+if not hasattr(pyliste, "__getitem__"):
+    raise ValueError("sequence required")
+if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not in range(len(self)):
+    raise ValueError("bad page number(s)")%}
+        %pythonappend select %{self._reset_page_refs()%}
+        PyObject *select(PyObject *pyliste)
+        {
+            // preparatory stuff:
+            // (1) get underlying pdf document,
+            // (2) transform Python list into integer array
+
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                // call retainpages (code copy of fz_clean_file.c)
+                globals glo = {0};
+                glo.ctx = gctx;
+                glo.doc = pdf;
+                retainpages(gctx, &glo, pyliste);
+                if (pdf->rev_page_map)
+                {
+                    pdf_drop_page_tree(gctx, pdf);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // remove one page
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_deletePage, !result)
+        PyObject *_deletePage(int pno)
+        {
+            fz_try(gctx) {
+                fz_document *doc = (fz_document *) $self;
+                pdf_document *pdf = pdf_specifics(gctx, doc);
+                pdf_delete_page(gctx, pdf, pno);
+                if (pdf->rev_page_map)
+                {
+                    pdf_drop_page_tree(gctx, pdf);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //********************************************************************
+        // get document permissions
+        //********************************************************************
+        %pythoncode%{@property%}
+        %pythonprepend permissions %{
+        """Document permissions."""
+
+        if self.isEncrypted:
+            return 0
+        %}
+        PyObject *permissions()
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_document_from_fz_document(gctx, doc);
+
+            // for PDF return result of standard function
+            if (pdf)
+                return Py_BuildValue("i", pdf_document_permissions(gctx, pdf));
+
+            // otherwise simulate the PDF return value
+            int perm = (int) 0xFFFFFFFC;  // all permissions granted
+            // now switch off where needed
+            if (!fz_has_permission(gctx, doc, FZ_PERMISSION_PRINT))
+                perm = perm ^ PDF_PERM_PRINT;
+            if (!fz_has_permission(gctx, doc, FZ_PERMISSION_EDIT))
+                perm = perm ^ PDF_PERM_MODIFY;
+            if (!fz_has_permission(gctx, doc, FZ_PERMISSION_COPY))
+                perm = perm ^ PDF_PERM_COPY;
+            if (!fz_has_permission(gctx, doc, FZ_PERMISSION_ANNOTATE))
+                perm = perm ^ PDF_PERM_ANNOTATE;
+            return Py_BuildValue("i", perm);
+        }
+
+        FITZEXCEPTION(_getCharWidths, !result)
+        CLOSECHECK(_getCharWidths, """Return list of glyphs and glyph widths of a font.""")
+        PyObject *_getCharWidths(int xref, char *bfname, char *ext,
+                                 int ordering, int limit, int idx = 0)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            PyObject *wlist = NULL;
+            int i, glyph, mylimit;
+            mylimit = limit;
+            if (mylimit < 256) mylimit = 256;
+            int cwlen = 0;
+            int lang = 0;
+            const unsigned char *data;
+            int size, index;
+            fz_font *font = NULL, *fb_font= NULL;
+            fz_buffer *buf = NULL;
+
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                if (ordering >= 0) {
+                    data = fz_lookup_cjk_font(gctx, ordering, &size, &index);
+                    font = fz_new_font_from_memory(gctx, NULL, data, size, index, 0);
+                    goto weiter;
+                }
+                data = fz_lookup_base14_font(gctx, bfname, &size);
+                if (data) {
+                    font = fz_new_font_from_memory(gctx, bfname, data, size, 0, 0);
+                    goto weiter;
+                }
+                buf = JM_get_fontbuffer(gctx, pdf, xref);
+                if (!buf) {
+                    fz_throw(gctx, FZ_ERROR_GENERIC, "font at xref %d is not supported", xref);
+                }
+                font = fz_new_font_from_buffer(gctx, NULL, buf, idx, 0);
+
+                weiter:;
+                wlist = PyList_New(0);
+                float adv;
+                for (i = 0; i < mylimit; i++)
+                {
+                    glyph = fz_encode_character(gctx, font, i);
+                    adv = fz_advance_glyph(gctx, font, glyph, 0);
+                    if (ordering >= 0)
+                        glyph = i;
+
+
+                    if (glyph > 0)
+                    {
+                        LIST_APPEND_DROP(wlist, Py_BuildValue("if", glyph, adv));
+                    }
+                    else
+                    {
+                        LIST_APPEND_DROP(wlist, Py_BuildValue("if", glyph, 0.0));
+                    }
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, buf);
+                fz_drop_font(gctx, font);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return wlist;
+        }
+
+        FITZEXCEPTION(_getPageObjNumber, !result)
+        CLOSECHECK0(_getPageObjNumber, """Get (xref, generation) of page number.""")
+        PyObject *_getPageObjNumber(int pno)
+        {
+            fz_document *this_doc = (fz_document *) $self;
+            int pageCount = fz_count_pages(gctx, this_doc);
+            int n = pno;
+            while (n < 0) n += pageCount;
+            pdf_obj *pageref = NULL;
+            fz_var(pageref);
+            pdf_document *pdf = pdf_specifics(gctx, this_doc);
+            fz_try(gctx) {
+                if (n >= pageCount) THROWMSG("bad page number(s)");
+                ASSERT_PDF(pdf);
+                pageref = pdf_lookup_page_obj(gctx, pdf, n);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+
+            return Py_BuildValue("ii", pdf_to_num(gctx, pageref),
+                                       pdf_to_gen(gctx, pageref));
+        }
+
+        FITZEXCEPTION(_getPageInfo, !result)
+        CLOSECHECK(_getPageInfo, """List fonts, images, XObjects used on a page.""")
+        PyObject *_getPageInfo(int pno, int what)
+        {
+            fz_document *doc = (fz_document *) $self;
+            pdf_document *pdf = pdf_specifics(gctx, doc);
+            int pageCount = fz_count_pages(gctx, doc);
+            pdf_obj *pageref, *rsrc;
+            PyObject *liste = NULL;  // returned object
+            int n = pno;  // pno < 0 is allowed
+            while (n < 0) n += pageCount;  // make it non-negative
+            fz_var(liste);
+            fz_try(gctx) {
+                if (n >= pageCount) THROWMSG("bad page number(s)");
+                ASSERT_PDF(pdf);
+                pageref = pdf_lookup_page_obj(gctx, pdf, n);
+                rsrc = pdf_dict_get_inheritable(gctx, pageref, PDF_NAME(Resources));
+                if (!pageref || !rsrc) THROWMSG("cannot retrieve page info");
+                liste = PyList_New(0);
+                JM_scan_resources(gctx, pdf, rsrc, liste, what, 0);
+            }
+            fz_catch(gctx)
+            {
+                Py_XDECREF(liste);
+                return NULL;
+            }
+            return liste;
+        }
+
+        FITZEXCEPTION(extractFont, !result)
+        CLOSECHECK(extractFont, """Get a font by xref.""")
+        PyObject *extractFont(int xref = 0, int info_only = 0)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+
+            fz_buffer *buffer = NULL;
+            pdf_obj *obj, *basefont, *bname;
+            PyObject *bytes = PyBytes_FromString("");
+            char *ext = NULL;
+            char *fontname = NULL;
+            PyObject *nulltuple = Py_BuildValue("sssO", "", "", "", bytes);
+            PyObject *tuple;
+            Py_ssize_t len = 0;
+            fz_try(gctx) {
+                obj = pdf_load_object(gctx, pdf, xref);
+                pdf_obj *type = pdf_dict_get(gctx, obj, PDF_NAME(Type));
+                pdf_obj *subtype = pdf_dict_get(gctx, obj, PDF_NAME(Subtype));
+                if(pdf_name_eq(gctx, type, PDF_NAME(Font)) &&
+                   strncmp(pdf_to_name(gctx, subtype), "CIDFontType", 11) != 0)
+                {
+                    basefont = pdf_dict_get(gctx, obj, PDF_NAME(BaseFont));
+                    if (!basefont || pdf_is_null(gctx, basefont))
+                        bname = pdf_dict_get(gctx, obj, PDF_NAME(Name));
+                    else
+                        bname = basefont;
+                    ext = JM_get_fontextension(gctx, pdf, xref);
+                    if (strcmp(ext, "n/a") != 0 && !info_only)
+                    {
+                        buffer = JM_get_fontbuffer(gctx, pdf, xref);
+                        bytes = JM_BinFromBuffer(gctx, buffer);
+                        fz_drop_buffer(gctx, buffer);
+                    }
+                    tuple = PyTuple_New(4);
+                    PyTuple_SET_ITEM(tuple, 0, JM_EscapeStrFromStr(pdf_to_name(gctx, bname)));
+                    PyTuple_SET_ITEM(tuple, 1, JM_UnicodeFromStr(ext));
+                    PyTuple_SET_ITEM(tuple, 2, JM_UnicodeFromStr(pdf_to_name(gctx, subtype)));
+                    PyTuple_SET_ITEM(tuple, 3, bytes);
+                }
+                else
+                {
+                    tuple = nulltuple;
+                }
+            }
+            fz_always(gctx) {
+                JM_PyErr_Clear;
+                JM_Free(fontname);
+            }
+            fz_catch(gctx)
+            {
+                tuple = Py_BuildValue("sssO", "invalid-name", "", "", bytes);
+            }
+            return tuple;
+        }
+
+
+        FITZEXCEPTION(extractImage, !result)
+        CLOSECHECK(extractImage, """Get image by xref.""")
+        PyObject *extractImage(int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            pdf_obj *obj = NULL;
+            fz_buffer *res = NULL;
+            fz_image *img = NULL;
+            PyObject *rc = NULL;
+            const char *ext = NULL;
+            const char *cs_name = NULL;
+            int img_type, xres, yres, colorspace;
+            int smask = 0, width, height, bpc;
+            fz_var(img);
+            fz_var(res);
+            fz_var(obj);
+
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                if (!INRANGE(xref, 1, pdf_xref_len(gctx, pdf)-1))
+                    THROWMSG("xref out of range");
+
+                obj = pdf_new_indirect(gctx, pdf, xref, 0);
+                pdf_obj *subtype = pdf_dict_get(gctx, obj, PDF_NAME(Subtype));
+
+                if (!pdf_name_eq(gctx, subtype, PDF_NAME(Image)))
+                    THROWMSG("xref not an image");
+
+                pdf_obj *o = pdf_dict_get(gctx, obj, PDF_NAME(SMask));
+                if (o) smask = pdf_to_num(gctx, o);
+
+                res = pdf_load_raw_stream(gctx, obj);
+                unsigned char *c = NULL;
+                fz_buffer_storage(gctx, res, &c);
+                img_type = fz_recognize_image_format(gctx, c);
+
+                if (img_type != FZ_IMAGE_UNKNOWN)
+                {
+                    img = fz_new_image_from_buffer(gctx, res);
+                    ext = JM_image_extension(img_type);
+                }
+                else
+                {
+                    fz_drop_buffer(gctx, res);
+                    res = NULL;
+                    img = pdf_load_image(gctx, pdf, obj);
+                    res = fz_new_buffer_from_image_as_png(gctx, img,
+                                fz_default_color_params);
+                    ext = "png";
+                }
+                fz_image_resolution(img, &xres, &yres);
+                width = img->w;
+                height = img->h;
+                colorspace = img->n;
+                bpc = img->bpc;
+                cs_name = fz_colorspace_name(gctx, img->colorspace);
+
+                rc = PyDict_New();
+                DICT_SETITEM_DROP(rc, dictkey_ext,
+                                    JM_UnicodeFromStr(ext));
+                DICT_SETITEM_DROP(rc, dictkey_smask,
+                                    Py_BuildValue("i", smask));
+                DICT_SETITEM_DROP(rc, dictkey_width,
+                                    Py_BuildValue("i", width));
+                DICT_SETITEM_DROP(rc, dictkey_height,
+                                    Py_BuildValue("i", height));
+                DICT_SETITEM_DROP(rc, dictkey_colorspace,
+                                    Py_BuildValue("i", colorspace));
+                DICT_SETITEM_DROP(rc, dictkey_bpc,
+                                    Py_BuildValue("i", bpc));
+                DICT_SETITEM_DROP(rc, dictkey_xres,
+                                    Py_BuildValue("i", xres));
+                DICT_SETITEM_DROP(rc, dictkey_yres,
+                                    Py_BuildValue("i", yres));
+                DICT_SETITEM_DROP(rc, dictkey_cs_name,
+                                    JM_UnicodeFromStr(cs_name));
+                DICT_SETITEM_DROP(rc, dictkey_image,
+                                    JM_BinFromBuffer(gctx, res));
+            }
+            fz_always(gctx) {
+                fz_drop_image(gctx, img);
+                fz_drop_buffer(gctx, res);
+                pdf_drop_obj(gctx, obj);
+            }
+
+            fz_catch(gctx)
+            {
+                Py_CLEAR(rc);
+                return_none;
+            }
+            if (!rc)
+                return_none;
+            return rc;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Delete all bookmarks (table of contents)
+        // returns the list of deleted (now available) xref numbers
+        //---------------------------------------------------------------------
+        CLOSECHECK(_delToC, """Delete the TOC.""")
+        %pythonappend _delToC %{self.initData()%}
+        PyObject *_delToC()
+        {
+            PyObject *xrefs = PyList_New(0);          // create Python list
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return xrefs;                   // not a pdf
+
+            pdf_obj *root, *olroot, *first;
+            int xref_count, olroot_xref, i, xref;
+
+            // get the main root
+            root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root));
+            // get the outline root
+            olroot = pdf_dict_get(gctx, root, PDF_NAME(Outlines));
+            if (!olroot) return xrefs;                // no outlines or some problem
+
+            first = pdf_dict_get(gctx, olroot, PDF_NAME(First)); // first outline
+
+            xrefs = JM_outline_xrefs(gctx, first, xrefs);
+            xref_count = (int) PyList_Size(xrefs);
+
+            olroot_xref = pdf_to_num(gctx, olroot);        // delete OL root
+            pdf_delete_object(gctx, pdf, olroot_xref);     // delete OL root
+            pdf_dict_del(gctx, root, PDF_NAME(Outlines));  // delete OL root
+
+            for (i = 0; i < xref_count; i++)
+            {
+                xref = (int) PyInt_AsLong(PyList_GetItem(xrefs, i));
+                pdf_delete_object(gctx, pdf, xref);      // delete outline item
+            }
+            LIST_APPEND_DROP(xrefs, Py_BuildValue("i", olroot_xref));
+            pdf->dirty = 1;
+            return xrefs;
+        }
+
+        //---------------------------------------------------------------------
+        // Check: is xref a stream object?
+        //---------------------------------------------------------------------
+        CLOSECHECK0(isStream, """Check if xref is a stream object.""")
+        PyObject *isStream(int xref=0)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) Py_RETURN_FALSE;  // not a PDF
+            return JM_BOOL(pdf_obj_num_is_stream(gctx, pdf, xref));
+        }
+
+        //---------------------------------------------------------------------
+        // Return or set NeedAppearances
+        //---------------------------------------------------------------------
+        %pythonprepend need_appearances
+%{"""Get/set the NeedAppearances value."""
+if self.isClosed:
+    raise ValueError("document closed")
+if not self.isFormPDF:
+    return None
+%}
+        PyObject *need_appearances(PyObject *value=NULL)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            int oldval = -1;
+            pdf_obj *app = NULL;
+            char appkey[] = "NeedAppearances";
+            fz_try(gctx) {
+                pdf_obj *form = pdf_dict_getp(gctx, pdf_trailer(gctx, pdf),
+                                "Root/AcroForm");
+                app = pdf_dict_gets(gctx, form, appkey);
+                if (pdf_is_bool(gctx, app)) {
+                    oldval = pdf_to_bool(gctx, app);
+                }
+
+                if (EXISTS(value)) {
+                    pdf_dict_puts_drop(gctx, form, appkey, PDF_TRUE);
+                } else if (value == Py_False) {
+                    pdf_dict_puts_drop(gctx, form, appkey, PDF_FALSE);
+                }
+            }
+            fz_catch(gctx) {
+                return_none;
+            }
+            if (value != Py_None) {
+                return value;
+            }
+            if (oldval >= 0) {
+                return JM_BOOL(oldval);
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Return the /SigFlags value
+        //---------------------------------------------------------------------
+        CLOSECHECK0(getSigFlags, """Get /SigFlags value.""")
+        PyObject *getSigFlags()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return Py_BuildValue("i", -1);  // not a PDF
+            size_t sigflag = -1;
+            fz_try(gctx) {
+                pdf_obj *sigflags = pdf_dict_getl(gctx,
+                                                  pdf_trailer(gctx, pdf),
+                                                  PDF_NAME(Root),
+                                                  PDF_NAME(AcroForm),
+                                                  PDF_NAME(SigFlags),
+                                                  NULL);
+                if (sigflags) {
+                    sigflag = (size_t) pdf_to_int(gctx, sigflags);
+                }
+            }
+            fz_catch(gctx) {
+                return Py_BuildValue("i", -1);  // any problem
+            }
+            return Py_BuildValue("I", sigflag);
+        }
+
+        //---------------------------------------------------------------------
+        // Check: is this an AcroForm with at least one field?
+        //---------------------------------------------------------------------
+        CLOSECHECK0(isFormPDF, """Check if PDF Form document.""")
+        %pythoncode%{@property%}
+        PyObject *isFormPDF()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) Py_RETURN_FALSE;  // not a PDF
+            int count = -1;  // init count
+            fz_try(gctx) {
+                pdf_obj *fields = pdf_dict_getl(gctx,
+                                                pdf_trailer(gctx, pdf),
+                                                PDF_NAME(Root),
+                                                PDF_NAME(AcroForm),
+                                                PDF_NAME(Fields),
+                                                NULL);
+                if (pdf_is_array(gctx, fields)) {
+                    count = pdf_array_len(gctx, fields);
+                }
+            }
+            fz_catch(gctx) {
+                Py_RETURN_FALSE;
+            }
+            if (count >= 0) {
+                return Py_BuildValue("i", count);
+            } else {
+                Py_RETURN_FALSE;
+            }
+        }
+
+        //---------------------------------------------------------------------
+        // Return the list of field font resource names
+        //---------------------------------------------------------------------
+        CLOSECHECK0(FormFonts, """Get list of field font resource names.""")
+        %pythoncode%{@property%}
+        PyObject *FormFonts()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return_none;           // not a PDF
+            pdf_obj *fonts = NULL;
+            PyObject *liste = PyList_New(0);
+            fz_try(gctx) {
+                fonts = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(DR), PDF_NAME(Font), NULL);
+                if (fonts && pdf_is_dict(gctx, fonts))       // fonts exist
+                {
+                    int i, n = pdf_dict_len(gctx, fonts);
+                    for (i = 0; i < n; i++)
+                    {
+                        pdf_obj *f = pdf_dict_get_key(gctx, fonts, i);
+                        LIST_APPEND_DROP(liste, JM_UnicodeFromStr(pdf_to_name(gctx, f)));
+                    }
+                }
+            }
+            fz_catch(gctx) return_none;  // any problem yields None
+            return liste;
+        }
+
+        //---------------------------------------------------------------------
+        // Add a field font
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_addFormFont, !result)
+        CLOSECHECK(_addFormFont, """Add new form font.""")
+        PyObject *_addFormFont(char *name, char *font)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return_none;  // not a PDF
+            pdf_obj *fonts = NULL;
+            fz_try(gctx) {
+                fonts = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root),
+                             PDF_NAME(AcroForm), PDF_NAME(DR), PDF_NAME(Font), NULL);
+                if (!fonts || !pdf_is_dict(gctx, fonts))
+                    THROWMSG("PDF has no form fonts yet");
+                pdf_obj *k = pdf_new_name(gctx, (const char *) name);
+                pdf_obj *v = JM_pdf_obj_from_str(gctx, pdf, font);
+                pdf_dict_put(gctx, fonts, k, v);
+            }
+            fz_catch(gctx) NULL;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Get Xref Number of Outline Root, create it if missing
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getOLRootNumber, !result)
+        CLOSECHECK(_getOLRootNumber, """Get xref of Outline Root, create it if missing.""")
+        PyObject *_getOLRootNumber()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            pdf_obj *root, *olroot, *ind_obj;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                // get main root
+                root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root));
+                // get outline root
+                olroot = pdf_dict_get(gctx, root, PDF_NAME(Outlines));
+                if (!olroot)
+                {
+                    olroot = pdf_new_dict(gctx, pdf, 4);
+                    pdf_dict_put(gctx, olroot, PDF_NAME(Type), PDF_NAME(Outlines));
+                    ind_obj = pdf_add_object(gctx, pdf, olroot);
+                    pdf_dict_put(gctx, root, PDF_NAME(Outlines), ind_obj);
+                    olroot = pdf_dict_get(gctx, root, PDF_NAME(Outlines));
+                    pdf_drop_obj(gctx, ind_obj);
+                    pdf->dirty = 1;
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("i", pdf_to_num(gctx, olroot));
+        }
+
+        //---------------------------------------------------------------------
+        // Get a new Xref number
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getNewXref, !result)
+        CLOSECHECK(_getNewXref, """Make new xref.""")
+        PyObject *_getNewXref()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return Py_BuildValue("i", pdf_create_object(gctx, pdf));
+        }
+
+        //---------------------------------------------------------------------
+        // Get Length of Xref
+        //---------------------------------------------------------------------
+        CLOSECHECK0(_getXrefLength, """Get length of xref table.""")
+        PyObject *_getXrefLength()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            int xreflen = 0;
+            if (pdf) xreflen = pdf_xref_len(gctx, pdf);
+            return Py_BuildValue("i", xreflen);
+        }
+
+        //---------------------------------------------------------------------
+        // Get XML Metadata xref
+        //---------------------------------------------------------------------
+        CLOSECHECK0(_getXmlMetadataXref, """Get xref of document XML metadata.""")
+        PyObject *_getXmlMetadataXref()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            pdf_obj *xml;
+            int xref = 0;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                pdf_obj *root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root));
+                if (!root) THROWMSG("could not load root object");
+                xml = pdf_dict_gets(gctx, root, "Metadata");
+                if (xml) xref = pdf_to_num(gctx, xml);
+            }
+            fz_catch(gctx) {;}
+            return Py_BuildValue("i", xref);
+        }
+
+        //---------------------------------------------------------------------
+        // Delete XML-based Metadata
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_delXmlMetadata, !result)
+        CLOSECHECK(_delXmlMetadata, """Delete XML metadata.""")
+        PyObject *_delXmlMetadata()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                pdf_obj *root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root));
+                if (root) pdf_dict_dels(gctx, root, "Metadata");
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Get Object String of xref
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getXrefString, !result)
+        CLOSECHECK0(_getXrefString, """Get xref object source as a string.""")
+        PyObject *_getXrefString(int xref, int compressed=0, int ascii=0)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            pdf_obj *obj = NULL;
+            PyObject *text = NULL;
+            
+            fz_buffer *res=NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                int xreflen = pdf_xref_len(gctx, pdf);
+                if (!INRANGE(xref, 1, xreflen-1))
+                    THROWMSG("xref out of range");
+                obj = pdf_load_object(gctx, pdf, xref);
+                res = JM_object_to_buffer(gctx, pdf_resolve_indirect(gctx, obj), compressed, ascii);
+                text = JM_EscapeStrFromBuffer(gctx, res);
+            }
+            fz_always(gctx) {
+                pdf_drop_obj(gctx, obj);
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) return PyUnicode_FromString("");
+            return text;
+        }
+
+        //---------------------------------------------------------------------
+        // Get String of PDF trailer
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getTrailerString, !result)
+        CLOSECHECK0(_getTrailerString, """Get PDF trailer as a string.""")
+        PyObject *_getTrailerString(int compressed=0, int ascii=0)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return_none;
+            PyObject *text = NULL;
+            fz_buffer *res=NULL;
+            fz_try(gctx) {
+                res = JM_object_to_buffer(gctx, pdf_trailer(gctx, pdf), compressed, ascii);
+                text = JM_EscapeStrFromBuffer(gctx, res);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return PyUnicode_FromString("PDF trailer damaged");
+            }
+            return text;
+        }
+
+        //---------------------------------------------------------------------
+        // Get compressed stream of an object by xref
+        // return_none if not stream
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getXrefStreamRaw, !result)
+        CLOSECHECK(_getXrefStreamRaw, """Get xref stream without decompression.""")
+        PyObject *_getXrefStreamRaw(int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            PyObject *r = Py_None;
+            pdf_obj *obj = NULL;
+            fz_var(obj);
+            fz_buffer *res = NULL;
+            fz_var(res);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                int xreflen = pdf_xref_len(gctx, pdf);
+                if (!INRANGE(xref, 1, xreflen-1))
+                    THROWMSG("xref out of range");
+                obj = pdf_new_indirect(gctx, pdf, xref, 0);
+                if (pdf_is_stream(gctx, obj))
+                {
+                    res = pdf_load_raw_stream_number(gctx, pdf, xref);
+                    r = JM_BinFromBuffer(gctx, res);
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                pdf_drop_obj(gctx, obj);
+            }
+            fz_catch(gctx)
+            {
+                Py_CLEAR(r);
+                return NULL;
+            }
+            return r;
+        }
+
+        //---------------------------------------------------------------------
+        // Get decompressed stream of an object by xref
+        // return_none if not stream
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getXrefStream, !result)
+        CLOSECHECK(_getXrefStream, """Get decompressed xref stream.""")
+        PyObject *_getXrefStream(int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            PyObject *r = Py_None;
+            pdf_obj *obj = NULL;
+            fz_var(obj);
+            fz_buffer *res = NULL;
+            fz_var(res);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                int xreflen = pdf_xref_len(gctx, pdf);
+                if (!INRANGE(xref, 1, xreflen-1))
+                    THROWMSG("xref out of range");
+                obj = pdf_new_indirect(gctx, pdf, xref, 0);
+                if (pdf_is_stream(gctx, obj))
+                {
+                    res = pdf_load_stream_number(gctx, pdf, xref);
+                    r = JM_BinFromBuffer(gctx, res);
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                pdf_drop_obj(gctx, obj);
+            }
+            fz_catch(gctx)
+            {
+                Py_CLEAR(r);
+                return NULL;
+            }
+            return r;
+        }
+
+        //---------------------------------------------------------------------
+        // Update an Xref number with a new object given as a string
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_updateObject, !result)
+        CLOSECHECK(_updateObject, """Replace object definition source.""")
+        PyObject *_updateObject(int xref, char *text, struct Page *page = NULL)
+        {
+            pdf_obj *new_obj;
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                int xreflen = pdf_xref_len(gctx, pdf);
+                if (!INRANGE(xref, 1, xreflen-1))
+                    THROWMSG("xref out of range");
+                // create new object with passed-in string
+                new_obj = JM_pdf_obj_from_str(gctx, pdf, text);
+                pdf_update_object(gctx, pdf, xref, new_obj);
+                pdf_drop_obj(gctx, new_obj);
+                if (page)
+                    JM_refresh_link_table(gctx, pdf_page_from_fz_page(gctx, (fz_page *)page));
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Update a stream identified by its xref
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_updateStream, !result)
+        CLOSECHECK(_updateStream, """Replace xref stream part.""")
+        PyObject *_updateStream(int xref = 0, PyObject *stream = NULL, int new = 0)
+        {
+            pdf_obj *obj = NULL;
+            fz_var(obj);
+            fz_buffer *res = NULL;
+            fz_var(res);
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                int xreflen = pdf_xref_len(gctx, pdf);
+                if (!INRANGE(xref, 1, xreflen-1))
+                    THROWMSG("xref out of range");
+                // get the object
+                obj = pdf_new_indirect(gctx, pdf, xref, 0);
+                if (!new && !pdf_is_stream(gctx, obj))
+                    THROWMSG("xref not a stream object");
+                res = JM_BufferFromBytes(gctx, stream);
+                if (!res) THROWMSG("bad type: 'stream'");
+                JM_update_stream(gctx, pdf, obj, res, 1);
+
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                pdf_drop_obj(gctx, obj);
+            }
+            fz_catch(gctx)
+                return NULL;
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Add or update metadata based on provided raw string
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_setMetadata, !result)
+        CLOSECHECK(_setMetadata, """Set old style metadata.""")
+        PyObject *_setMetadata(char *text)
+        {
+            pdf_obj *info, *new_info, *new_info_ind;
+            int info_num = 0;               // will contain xref no of info object
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                // create new /Info object based on passed-in string
+                new_info = JM_pdf_obj_from_str(gctx, pdf, text);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            // replace existing /Info object
+            info = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Info));
+            if (info)
+            {
+                info_num = pdf_to_num(gctx, info);    // get xref no of old info
+                pdf_update_object(gctx, pdf, info_num, new_info);  // insert new
+                pdf_drop_obj(gctx, new_info);
+                return_none;
+            }
+            // create new indirect object from /Info object
+            new_info_ind = pdf_add_object(gctx, pdf, new_info);
+            // put this in the trailer dictionary
+            pdf_dict_put_drop(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Info), new_info_ind);
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // create / refresh the page map
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_make_page_map, !result)
+        CLOSECHECK0(_make_page_map, """Make an array page number -> page object.""")
+        PyObject *_make_page_map()
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            if (!pdf) return_none;
+            fz_try(gctx) {
+                pdf_drop_page_tree(gctx, pdf);
+                pdf_load_page_tree(gctx, pdf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("i", pdf->rev_page_count);
+        }
+
+
+        //---------------------------------------------------------------------
+        // full (deep) copy of one page
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(fullcopyPage, !result)
+        CLOSECHECK0(fullcopyPage, """Make full page duplication.""")
+        %pythonappend fullcopyPage %{self._reset_page_refs()%}
+        PyObject *fullcopyPage(int pno, int to = -1)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            int pageCount = pdf_count_pages(gctx, pdf);
+            fz_buffer *res = NULL, *nres=NULL;
+            pdf_obj *page2 = NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                if (!INRANGE(pno, 0, pageCount - 1) ||
+                    !INRANGE(to, -1, pageCount - 1))
+                    THROWMSG("bad page number(s)");
+
+                pdf_obj *page1 = pdf_resolve_indirect(gctx,
+                                 pdf_lookup_page_obj(gctx, pdf, pno));
+
+                pdf_obj *page2 = pdf_deep_copy_obj(gctx, page1);
+                pdf_obj *old_annots = pdf_dict_get(gctx, page2, PDF_NAME(Annots));
+
+                // copy annotations, but remove Popup and IRT types
+                if (old_annots) {
+                    int i, n = pdf_array_len(gctx, old_annots);
+                    pdf_obj *new_annots = pdf_new_array(gctx, pdf, n);
+                    for (i = 0; i < n; i++) {
+                        pdf_obj *o = pdf_array_get(gctx, old_annots, i);
+                        pdf_obj *subtype = pdf_dict_get(gctx, o, PDF_NAME(Subtype));
+                        if (pdf_name_eq(gctx, subtype, PDF_NAME(Popup))) continue;
+                        if (pdf_dict_gets(gctx, o, "IRT")) continue;
+                        pdf_obj *copy_o = pdf_deep_copy_obj(gctx,
+                                            pdf_resolve_indirect(gctx, o));
+                        int xref = pdf_create_object(gctx, pdf);
+                        pdf_update_object(gctx, pdf, xref, copy_o);
+                        pdf_drop_obj(gctx, copy_o);
+                        copy_o = pdf_new_indirect(gctx, pdf, xref, 0);
+                        pdf_dict_del(gctx, copy_o, PDF_NAME(Popup));
+                        pdf_dict_del(gctx, copy_o, PDF_NAME(P));
+                        pdf_array_push_drop(gctx, new_annots, copy_o);
+                    }
+                pdf_dict_put_drop(gctx, page2, PDF_NAME(Annots), new_annots);
+                }
+
+                // copy the old contents stream(s)
+                res = JM_read_contents(gctx, page1);
+
+                // create new /Contents object for page2
+                if (res) {
+                    pdf_obj *contents = pdf_add_stream(gctx, pdf,
+                               fz_new_buffer_from_copied_data(gctx, "  ", 1), NULL, 0);
+                    JM_update_stream(gctx, pdf, contents, res, 1);
+                    pdf_dict_put_drop(gctx, page2, PDF_NAME(Contents), contents);
+                }
+
+                // now insert target page, making sure it is an indirect object
+                int xref = pdf_create_object(gctx, pdf);  // get new xref
+                pdf_update_object(gctx, pdf, xref, page2);  // store new page
+                pdf_drop_obj(gctx, page2);  // give up this object for now
+
+                page2 = pdf_new_indirect(gctx, pdf, xref, 0);  // reread object
+                pdf_insert_page(gctx, pdf, to, page2);  // and store the page
+                pdf_drop_obj(gctx, page2);
+            }
+            fz_always(gctx) {
+                pdf_drop_page_tree(gctx, pdf);
+                fz_drop_buffer(gctx, res);
+                fz_drop_buffer(gctx, nres);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        //---------------------------------------------------------------------
+        // move or copy one page
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_move_copy_page, !result)
+        CLOSECHECK0(_move_copy_page, """Move or copy a PDF page reference.""")
+        %pythonappend _move_copy_page %{self._reset_page_refs()%}
+        PyObject *_move_copy_page(int pno, int nb, int before, int copy)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self);
+            int i1, i2, pos, count, same = 0;
+            pdf_obj *parent1 = NULL, *parent2 = NULL, *parent = NULL;
+            pdf_obj *kids1, *kids2;
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                // get the two page objects -----------------------------------
+                // locate the /Kids arrays and indices in each
+                pdf_obj *page1 = pdf_lookup_page_loc(gctx, pdf, pno, &parent1, &i1);
+                kids1 = pdf_dict_get(gctx, parent1, PDF_NAME(Kids));
+
+                pdf_obj *page2 = pdf_lookup_page_loc(gctx, pdf, nb, &parent2, &i2);
+                kids2 = pdf_dict_get(gctx, parent2, PDF_NAME(Kids));
+
+                if (before)  // calc index of source page in target /Kids
+                    pos = i2;
+                else
+                    pos = i2 + 1;
+
+                // same /Kids array? ------------------------------------------
+                same = pdf_objcmp(gctx, kids1, kids2);
+
+                // put source page in target /Kids array ----------------------
+                if (!copy && same != 0)  // update parent in page object
+                {
+                    pdf_dict_put(gctx, page1, PDF_NAME(Parent), parent2);
+                }
+                pdf_array_insert(gctx, kids2, page1, pos);
+
+                if (same != 0) // different /Kids arrays ----------------------
+                {
+                    parent = parent2;
+                    while (parent)  // increase /Count objects in parents
+                    {
+                        count = pdf_dict_get_int(gctx, parent, PDF_NAME(Count));
+                        pdf_dict_put_int(gctx, parent, PDF_NAME(Count), count + 1);
+                        parent = pdf_dict_get(gctx, parent, PDF_NAME(Parent));
+                    }
+                    if (!copy)  // delete original item
+                    {
+                        pdf_array_delete(gctx, kids1, i1);
+                        parent = parent1;
+                        while (parent) // decrease /Count objects in parents
+                        {
+                            count = pdf_dict_get_int(gctx, parent, PDF_NAME(Count));
+                            pdf_dict_put_int(gctx, parent, PDF_NAME(Count), count - 1);
+                            parent = pdf_dict_get(gctx, parent, PDF_NAME(Parent));
+                        }
+                    }
+                }
+                else // same /Kids array --------------------------------------
+                {
+                    if (copy) // source page is copied
+                    {
+                        parent = parent2;
+                        while (parent) // increase /Count object in parents
+                        {
+                            count = pdf_dict_get_int(gctx, parent, PDF_NAME(Count));
+                            pdf_dict_put_int(gctx, parent, PDF_NAME(Count), count + 1);
+                            parent = pdf_dict_get(gctx, parent, PDF_NAME(Parent));
+                        }
+                    }
+                    else
+                    {
+                        if (i1 < pos)
+                            pdf_array_delete(gctx, kids1, i1);
+                        else
+                            pdf_array_delete(gctx, kids1, i1 + 1);
+                    }
+                }
+                if (pdf->rev_page_map)  // page map no longer valid: drop it
+                {
+                    pdf_drop_page_tree(gctx, pdf);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Initialize document: set outline and metadata properties
+        //---------------------------------------------------------------------
+        %pythoncode %{
+            def initData(self):
+                if self.isEncrypted:
+                    raise ValueError("cannot initData - document still encrypted")
+                self._outline = self._loadOutline()
+                self.metadata = dict([(k,self._getMetadata(v)) for k,v in {'format':'format', 'title':'info:Title', 'author':'info:Author','subject':'info:Subject', 'keywords':'info:Keywords','creator':'info:Creator', 'producer':'info:Producer', 'creationDate':'info:CreationDate', 'modDate':'info:ModDate'}.items()])
+                self.metadata['encryption'] = None if self._getMetadata('encryption')=='None' else self._getMetadata('encryption')
+
+            outline = property(lambda self: self._outline)
+            _getPageXref = _getPageObjNumber
+
+            def getPageFontList(self, pno, full=False):
+                """Retrieve a list of fonts used on a page.
+                """
+                if self.isClosed or self.isEncrypted:
+                    raise ValueError("document closed or encrypted")
+                if not self.isPDF:
+                    return ()
+                val = self._getPageInfo(pno, 1)
+                if full is False:
+                    return [v[:-1] for v in val]
+                return val
+
+
+            def getPageImageList(self, pno, full=False):
+                """Retrieve a list of images used on a page.
+                """
+                if self.isClosed or self.isEncrypted:
+                    raise ValueError("document closed or encrypted")
+                if not self.isPDF:
+                    return ()
+                val = self._getPageInfo(pno, 2)
+                if full is False:
+                    return [v[:-1] for v in val]
+                return val
+
+
+            def getPageXObjectList(self, pno):
+                """Retrieve a list of XObjects used on a page.
+                """
+                if self.isClosed or self.isEncrypted:
+                    raise ValueError("document closed or encrypted")
+                if not self.isPDF:
+                    return ()
+                val = self._getPageInfo(pno, 3)
+                return val
+
+
+            def copyPage(self, pno, to=-1):
+                """Copy a page within a PDF document.
+
+                Args:
+                    pno: source page number
+                    to: put before this page, '-1' means after last page.
+                """
+                if self.isClosed:
+                    raise ValueError("document closed")
+
+                pageCount = len(self)
+                if (
+                    pno not in range(pageCount) or
+                    to not in range(-1, pageCount)
+                   ):
+                    raise ValueError("bad page number(s)")
+                before = 1
+                copy = 1
+                if to == -1:
+                    to = pageCount - 1
+                    before = 0
+
+                return self._move_copy_page(pno, to, before, copy)
+
+            def movePage(self, pno, to = -1):
+                """Move a page within a PDF document.
+
+                Args:
+                    pno: source page number.
+                    to: put before this page, '-1' means after last page.
+                """
+                if self.isClosed:
+                    raise ValueError("document closed")
+
+                pageCount = len(self)
+                if (
+                    pno not in range(pageCount) or
+                    to not in range(-1, pageCount)
+                   ):
+                    raise ValueError("bad page number(s)")
+                before = 1
+                copy = 0
+                if to == -1:
+                    to = pageCount - 1
+                    before = 0
+
+                return self._move_copy_page(pno, to, before, copy)
+
+            def deletePage(self, pno = -1):
+                """ Delete one page from a PDF.
+                """
+                if not self.isPDF:
+                    raise ValueError("not a PDF")
+                if self.isClosed:
+                    raise ValueError("document closed")
+
+                pageCount = self.pageCount
+                while pno < 0:
+                    pno += pageCount
+
+                if not pno in range(pageCount):
+                    raise ValueError("bad page number(s)")
+
+                old_toc = self.getToC(False)
+                new_toc = _toc_remove_page(old_toc, pno+1, pno+1)
+                self._remove_links_to(pno, pno)
+
+                self._deletePage(pno)
+
+                self.setToC(new_toc)
+                self._reset_page_refs()
+
+
+
+            def deletePageRange(self, from_page = -1, to_page = -1):
+                """Delete pages from a PDF.
+                """
+                if not self.isPDF:
+                    raise ValueError("not a PDF")
+                if self.isClosed:
+                    raise ValueError("document closed")
+
+                pageCount = self.pageCount  # page count of document
+                f = from_page  # first page to delete
+                t = to_page  # last page to delete
+                while f < 0:
+                    f += pageCount
+                while t < 0:
+                    t += pageCount
+                if not f <= t < pageCount:
+                    raise ValueError("bad page number(s)")
+
+                old_toc = self.getToC(False)
+                new_toc = _toc_remove_page(old_toc, f+1, t+1)
+                self._remove_links_to(f, t)
+
+                for i in range(t, f - 1, -1):  # delete pages, last to first
+                    self._deletePage(i)
+
+                self.setToC(new_toc)
+                self._reset_page_refs()
+
+
+            def saveIncr(self):
+                """ Save PDF incrementally"""
+                return self.save(self.name, incremental=True, encryption=PDF_ENCRYPT_KEEP)
+
+
+            def xrefLength(self):
+                """Return the length of the xref table.
+                """
+                return self._getXrefLength()
+
+
+            def get_pdf_object(self, xref, compressed=False, ascii=False):
+                """Return the object definition of an xref.
+                """
+                return self._getXrefString(xref, compressed, ascii)
+
+
+            def updateObject(self, xref, text, page=None):
+                """Repleace the object at xref with text.
+
+                Optionally reload a page.
+                """
+                return self._updateObject(xref, text, page=page)
+
+
+            def xrefStream(self, xref):
+                """Return the decompressed stream content of an xref.
+                """
+                return self._getXrefStream(xref)
+
+
+            def xrefStreamRaw(self, xref):
+                """ Return the raw stream content of an xref.
+                """
+                return self._getXrefStreamRaw(xref)
+
+
+            def updateStream(self, xref, stream, new=False):
+                """Repleace the stream at xref with stream (bytes).
+                """
+                return self._updateStream(xref, stream, new=new)
+
+
+            def PDFTrailer(self, compressed=False, ascii=False):
+                """Return the PDF trailer string.
+                """
+                return self._getTrailerString(compressed, ascii)
+
+
+            def PDFCatalog(self):
+                """Return the xref of the PDF catalog object.
+                """
+                return self._getPDFroot()
+
+
+            def metadataXML(self):
+                """Get xref of document XML metadata."""
+                return self._getXmlMetadataXref()
+
+
+            def reload_page(self, page):
+                """Make a fresh copy of a page."""
+                old_annots = {}  # copy annot references to here
+                pno = page.number  # save the page number
+                for k, v in page._annot_refs.items():  # save the annot dictionary
+                    old_annots[k] = v
+                page._erase()  # remove the page
+                page = None
+                page = self.loadPage(pno)  # reload the page
+
+                # copy annot refs over to the new dictionary
+                page_proxy = weakref.proxy(page)
+                for k, v in old_annots.items():
+                    annot = old_annots[k]
+                    annot.parent = page_proxy  # refresh parent to new page
+                    page._annot_refs[k] = annot
+                return page
+
+
+            xrefObject = get_pdf_object
+
+
+            def __repr__(self):
+                m = "closed " if self.isClosed else ""
+                if self.stream is None:
+                    if self.name == "":
+                        return m + "Document(<new PDF, doc# %i>)" % self._graft_id
+                    return m + "Document('%s')" % (self.name,)
+                return m + "Document('%s', <memory, doc# %i>)" % (self.name, self._graft_id)
+
+
+            def __contains__(self, loc):
+                if type(loc) is int:
+                    if loc < self.pageCount:
+                        return True
+                    return False
+                if type(loc) not in (tuple, list) or len(loc) != 2:
+                    return False
+
+                chapter, pno = loc
+                if (type(chapter) != int or
+                    chapter < 0 or 
+                    chapter >= self.chapterCount
+                    ):
+                    return False
+                if (type(pno) != int or
+                    pno < 0 or
+                    pno >= self.chapterPageCount(chapter)
+                    ):
+                    return False
+
+                return True
+
+
+            def __getitem__(self, i=0):
+                if i not in self:
+                    raise IndexError("page not in document")
+                return self.loadPage(i)
+
+            def pages(self, start=None, stop=None, step=None):
+                """Return a generator iterator over a page range.
+
+                Arguments have the same meaning as for the range() built-in.
+                """
+                # set the start value
+                start = start or 0
+                while start < 0:
+                    start += self.pageCount
+                if start not in range(self.pageCount):
+                    raise ValueError("bad start page number")
+
+                # set the stop value
+                stop = stop if stop is not None and stop <= self.pageCount else self.pageCount
+
+                # set the step value
+                if step == 0:
+                    raise ValueError("arg 3 must not be zero")
+                if step is None:
+                    if start > stop:
+                        step = -1
+                    else:
+                        step = 1
+
+                for pno in range(start, stop, step):
+                    yield (self.loadPage(pno))
+
+
+            def __len__(self):
+                return self.pageCount
+
+            def _forget_page(self, page):
+                """Remove a page from document page dict."""
+                pid = id(page)
+                if pid in self._page_refs:
+                    self._page_refs[pid] = None
+
+            def _reset_page_refs(self):
+                """Invalidate all pages in document dictionary."""
+                if self.isClosed:
+                    return
+                for page in self._page_refs.values():
+                    if page:
+                        page._erase()
+                        page = None
+                self._page_refs.clear()
+
+            def __del__(self):
+                if hasattr(self, "_reset_page_refs"):
+                    self._reset_page_refs()
+                if hasattr(self, "Graftmaps"):
+                    for gmap in self.Graftmaps:
+                        self.Graftmaps[gmap] = None
+                if hasattr(self, "this") and self.thisown:
+                    self.__swig_destroy__(self)
+                    self.thisown = False
+
+                self.Graftmaps = {}
+                self.ShownPages = {}
+                self.stream = None
+                self._reset_page_refs = DUMMY
+                self.__swig_destroy__ = DUMMY
+                self.isClosed = True
+
+            def __enter__(self):
+                return self
+
+            def __exit__(self, *args):
+                self.close()
+            %}
+    }
+};
+
+/*****************************************************************************/
+// fz_page
+/*****************************************************************************/
+%nodefaultctor;
+struct Page {
+    %extend {
+        ~Page()
+        {
+            DEBUGMSG1("Page");
+            fz_page *this_page = (fz_page *) $self;
+            fz_drop_page(gctx, this_page);
+            DEBUGMSG2;
+        }
+        //---------------------------------------------------------------------
+        // bound()
+        //---------------------------------------------------------------------
+        PARENTCHECK(bound, """Get page rectangle.""")
+        %pythonappend bound %{val = Rect(val)%}
+        PyObject *bound() {
+            fz_rect rect = fz_bound_page(gctx, (fz_page *) $self);
+            return JM_py_from_rect(rect);
+        }
+        %pythoncode %{rect = property(bound, doc="page rectangle")%}
+
+        //---------------------------------------------------------------------
+        // Page.getImageBbox
+        //---------------------------------------------------------------------
+        %pythonprepend getImageBbox %{
+        """Get rectangle occupied by image 'name'.
+
+        'name' is either an item of the image full list, or the referencing
+        name string."""
+        CheckParent(self)
+        doc = self.parent
+        if doc.isClosed or doc.isEncrypted:
+            raise ValueError("doc is closed or encrypted")
+        inf_rect = Rect(1, 1, -1, -1)
+        if type(name) in (list, tuple):
+            if not type(name[-1]) is int:
+                raise ValueError("need a full page image list item")
+            item = name
+            if item[-1] != 0:
+                raise ValueError("unsupported image item")
+        else:
+            imglist = [i for i in doc.getPageImageList(self.number, True) if i[-1] == 0 and name == i[-3]]
+            if len(imglist) == 1:
+                item = imglist[0]
+            else:
+                raise ValueError("no valid image found")%}
+        %pythonappend getImageBbox %{
+        if not bool(val):
+            return inf_rect
+        rc = inf_rect
+        for v in val:
+            if v[0] == item[-3]:
+                rc = Quad(v[1]).rect
+                break
+        val = rc * self.transformationMatrix%}
+        PyObject *
+        getImageBbox(PyObject *name)
+        {
+            pdf_page *pdf_page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            PyObject *rc =NULL;
+            fz_try(gctx) {
+                rc = JM_image_reporter(gctx, pdf_page);
+            }
+            fz_catch(gctx) {
+                return_none;
+            }
+            return rc;
+        }
+
+        //---------------------------------------------------------------------
+        // run()
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(run, !result)
+        PARENTCHECK(run, """Run page through a device.""")
+        PyObject *run(struct DeviceWrapper *dw, PyObject *m)
+        {
+            fz_try(gctx) {
+                fz_run_page(gctx, (fz_page *) $self, dw->device, JM_matrix_from_py(m), NULL);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Page.getTextPage
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_get_text_page, !result)
+        struct TextPage *
+        _get_text_page(int flags=0)
+        {
+            fz_stext_page *textpage=NULL;
+            fz_try(gctx) {
+                textpage = JM_new_stext_page_from_page(gctx, (fz_page *) $self, flags);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct TextPage *) textpage;
+        }
+        %pythoncode %{
+        def getTextPage(self, flags=0):
+            CheckParent(self)
+            old_rotation = self.rotation
+            if old_rotation != 0:
+                self.setRotation(0)
+            try:
+                textpage = self._get_text_page(flags=flags)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            return textpage
+        %}
+
+        //---------------------------------------------------------------------
+        // Page.language
+        //---------------------------------------------------------------------
+        %pythoncode%{@property%}
+        %pythonprepend language %{"""Page language."""%}
+        PyObject *language()
+        {
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!pdfpage) return_none;
+            pdf_obj *lang = pdf_dict_get_inheritable(gctx, pdfpage->obj, PDF_NAME(Lang));
+            if (!lang) return_none;
+            return Py_BuildValue("s", pdf_to_str_buf(gctx, lang));
+        }
+
+
+        //---------------------------------------------------------------------
+        // Page.setLanguage
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setLanguage, !result)
+        PARENTCHECK(setLanguage, """Set PDF page default language.""")
+        PyObject *setLanguage(char *language=NULL)
+        {
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(pdfpage);
+                fz_text_language lang;
+                char buf[8];
+                if (!language) {
+                    pdf_dict_del(gctx, pdfpage->obj, PDF_NAME(Lang));
+                } else {
+                    lang = fz_text_language_from_string(language);
+                    pdf_dict_put_text_string(gctx, pdfpage->obj,
+                        PDF_NAME(Lang),
+                        fz_string_from_text_language(buf, lang));
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            Py_RETURN_TRUE;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Page.getSVGimage
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(getSVGimage, !result)
+        PARENTCHECK(getSVGimage, """Make SVG image from page.""")
+        PyObject *getSVGimage(PyObject *matrix = NULL)
+        {
+            fz_rect mediabox = fz_bound_page(gctx, (fz_page *) $self);
+            fz_device *dev = NULL;
+            fz_buffer *res = NULL;
+            PyObject *text = NULL;
+            fz_matrix ctm = JM_matrix_from_py(matrix);
+            fz_output *out = NULL;
+            fz_separations *seps = NULL;
+            fz_var(out);
+            fz_var(dev);
+            fz_var(res);
+            fz_rect tbounds = mediabox;
+            tbounds = fz_transform_rect(tbounds, ctm);
+
+            fz_try(gctx) {
+                res = fz_new_buffer(gctx, 1024);
+                out = fz_new_output_with_buffer(gctx, res);
+                dev = fz_new_svg_device(gctx, out,
+                                        tbounds.x1-tbounds.x0,  // width
+                                        tbounds.y1-tbounds.y0,  // height
+                                        FZ_SVG_TEXT_AS_PATH, 1);
+                fz_run_page(gctx, (fz_page *) $self, dev, ctm, NULL);
+                fz_close_device(gctx, dev);
+                text = JM_EscapeStrFromBuffer(gctx, res);
+            }
+            fz_always(gctx) {
+                fz_drop_device(gctx, dev);
+                fz_drop_output(gctx, out);
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return text;
+        }
+
+        //---------------------------------------------------------------------
+        // page addCaretAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_caret_annot, !result)
+        struct Annot *
+        _add_caret_annot(PyObject *point)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            fz_try(gctx) {
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_CARET);
+                if (point)
+                {
+                    fz_point p = JM_point_from_py(point);
+                    fz_rect r = pdf_annot_rect(gctx, annot);
+                    r = fz_make_rect(p.x, p.y, p.x + r.x1 - r.x0, p.y + r.y1 - r.y0);
+                    pdf_set_annot_rect(gctx, annot, r);
+                }
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // page addRedactAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_redact_annot, !result)
+        struct Annot *
+        _add_redact_annot(PyObject *quad,
+            char *text=NULL,
+            const char *da_str=NULL,
+            int align=0,
+            PyObject *fill=NULL,
+            PyObject *text_color=NULL)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            float fcol[4] = { 1, 1, 1, 0};
+            int nfcol = 0, i;
+            fz_try(gctx) {
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_REDACT);
+                fz_quad q = JM_quad_from_py(quad);
+                fz_rect r = fz_rect_from_quad(q);
+
+                // TODO calculate de-rotated rect
+                pdf_set_annot_rect(gctx, annot, r);
+                if (EXISTS(fill)) {
+                    JM_color_FromSequence(fill, &nfcol, fcol);
+                    pdf_obj *arr = pdf_new_array(gctx, page->doc, nfcol);
+                    for (i = 0; i < nfcol; i++)
+                    {
+                        pdf_array_push_real(gctx, arr, fcol[i]);
+                    }
+                    pdf_dict_put_drop(gctx, annot->obj, PDF_NAME(IC), arr);
+                }
+                if (text) {
+                    pdf_dict_puts_drop(gctx, annot->obj, "OverlayText",
+                                       pdf_new_text_string(gctx, text));
+                    pdf_dict_put_text_string(gctx,annot->obj, PDF_NAME(DA), da_str);
+                    pdf_dict_put_int(gctx, annot->obj, PDF_NAME(Q), (int64_t) align);
+                }
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // page addLineAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_line_annot, !result)
+        struct Annot *
+        _add_line_annot(PyObject *p1, PyObject *p2)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_LINE);
+                fz_point a = JM_point_from_py(p1);
+                fz_point b = JM_point_from_py(p2);
+                pdf_set_annot_line(gctx, annot, a, b);
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // page addTextAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_text_annot, !result)
+        struct Annot *
+        _add_text_annot(PyObject *point,
+            char *text,
+            char *icon=NULL)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            fz_rect r;
+            fz_point p = JM_point_from_py(point);
+            fz_var(annot);
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_TEXT);
+                r = pdf_annot_rect(gctx, annot);
+                r = fz_make_rect(p.x, p.y, p.x + r.x1 - r.x0, p.y + r.y1 - r.y0);
+                pdf_set_annot_rect(gctx, annot, r);
+                int flags = PDF_ANNOT_IS_PRINT;
+                pdf_set_annot_flags(gctx, annot, flags);
+                pdf_set_annot_contents(gctx, annot, text);
+                if (icon) {
+                    pdf_set_annot_icon_name(gctx, annot, icon);
+                }
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+                pdf_set_annot_rect(gctx, annot, r);
+                pdf_set_annot_flags(gctx, annot, flags);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // page addInkAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_ink_annot, !result)
+        struct Annot *
+        _add_ink_annot(PyObject *list)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            PyObject *p = NULL, *sublist = NULL;
+            pdf_obj *inklist = NULL, *stroke = NULL;
+            fz_matrix ctm, inv_ctm;
+            fz_point point;
+            fz_var(annot);
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                if (!PySequence_Check(list)) THROWMSG("arg must be a sequence");
+                pdf_page_transform(gctx, page, NULL, &ctm);
+                inv_ctm = fz_invert_matrix(ctm);
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_INK);
+                Py_ssize_t i, j, n0 = PySequence_Size(list), n1;
+                inklist = pdf_new_array(gctx, annot->page->doc, n0);
+
+                for (j = 0; j < n0; j++) {
+                    sublist = PySequence_ITEM(list, j);
+                    n1 = PySequence_Size(sublist);
+                    stroke = pdf_new_array(gctx, annot->page->doc, 2 * n1);
+
+                    for (i = 0; i < n1; i++) {
+                        p = PySequence_ITEM(sublist, i);
+                        if (!PySequence_Check(p) || PySequence_Size(p) != 2)
+                            THROWMSG("3rd level entries must be pairs of floats");
+                        point = fz_transform_point(JM_point_from_py(p), inv_ctm);
+                        pdf_array_push_real(gctx, stroke, point.x);
+                        pdf_array_push_real(gctx, stroke, point.y);
+                    }
+
+                    pdf_array_push_drop(gctx, inklist, stroke);
+                    stroke = NULL;
+                    Py_CLEAR(sublist);
+                }
+
+                pdf_dict_put_drop(gctx, annot->obj, PDF_NAME(InkList), inklist);
+                inklist = NULL;
+                pdf_dirty_annot(gctx, annot);
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+
+            fz_catch(gctx) {
+                Py_CLEAR(p);
+                Py_CLEAR(sublist);
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // page addStampAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_stamp_annot, !result)
+        struct Annot *
+        _add_stamp_annot(PyObject *rect, int stamp=0)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            pdf_obj *stamp_id[] = {PDF_NAME(Approved), PDF_NAME(AsIs),
+                                   PDF_NAME(Confidential), PDF_NAME(Departmental),
+                                   PDF_NAME(Experimental), PDF_NAME(Expired),
+                                   PDF_NAME(Final), PDF_NAME(ForComment),
+                                   PDF_NAME(ForPublicRelease), PDF_NAME(NotApproved),
+                                   PDF_NAME(NotForPublicRelease), PDF_NAME(Sold),
+                                   PDF_NAME(TopSecret), PDF_NAME(Draft)};
+            int n = nelem(stamp_id);
+            pdf_obj *name = stamp_id[0];
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                fz_rect r = JM_rect_from_py(rect);
+                if (fz_is_infinite_rect(r) || fz_is_empty_rect(r))
+                    THROWMSG("rect must be finite and not empty");
+                if (INRANGE(stamp, 0, n-1))
+                    name = stamp_id[stamp];
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_STAMP);
+                pdf_set_annot_rect(gctx, annot, r);
+                pdf_dict_put(gctx, annot->obj, PDF_NAME(Name), name);
+                pdf_set_annot_contents(gctx, annot,
+                        pdf_dict_get_name(gctx, annot->obj, PDF_NAME(Name)));
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // page addFileAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_file_annot, !result)
+        struct Annot *
+        _add_file_annot(PyObject *point,
+            PyObject *buffer,
+            char *filename,
+            char *ufilename=NULL,
+            char *desc=NULL,
+            char *icon=NULL)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            char *uf = ufilename, *d = desc;
+            if (!ufilename) uf = filename;
+            if (!desc) d = filename;
+            fz_buffer *filebuf = NULL;
+            fz_rect r;
+            fz_point p = JM_point_from_py(point);
+            fz_var(annot);
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                filebuf = JM_BufferFromBytes(gctx, buffer);
+                if (!filebuf) THROWMSG("bad type: 'buffer'");
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_FILE_ATTACHMENT);
+                r = pdf_annot_rect(gctx, annot);
+                r = fz_make_rect(p.x, p.y, p.x + r.x1 - r.x0, p.y + r.y1 - r.y0);
+                pdf_set_annot_rect(gctx, annot, r);
+                int flags = PDF_ANNOT_IS_PRINT;
+                pdf_set_annot_flags(gctx, annot, flags);
+
+                if (icon)
+                    pdf_set_annot_icon_name(gctx, annot, icon);
+
+                pdf_obj *val = JM_embed_file(gctx, page->doc, filebuf,
+                                    filename, uf, d, 1);
+                pdf_dict_put(gctx, annot->obj, PDF_NAME(FS), val);
+                pdf_dict_put_text_string(gctx, annot->obj, PDF_NAME(Contents), filename);
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+                pdf_set_annot_rect(gctx, annot, r);
+                pdf_set_annot_flags(gctx, annot, flags);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // page: add a text marker annotation
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_text_marker, !result)
+
+        %pythonprepend _add_text_marker %{
+        CheckParent(self)
+        if not self.parent.isPDF:
+            raise ValueError("not a PDF")%}
+
+        %pythonappend _add_text_marker %{
+        if not val:
+            return None
+        val.parent = weakref.proxy(self)
+        self._annot_refs[id(val)] = val%}
+
+        struct Annot *
+        _add_text_marker(PyObject *quads, int annot_type)
+        {
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            PyObject *item = NULL;
+            int rotation = JM_page_rotation(gctx, pdfpage);
+            fz_quad q;
+            fz_var(annot);
+            fz_var(item);
+            fz_try(gctx) {
+                if (rotation != 0) {
+                    pdf_dict_put_int(gctx, pdfpage->obj, PDF_NAME(Rotate), 0);
+                }
+                annot = pdf_create_annot(gctx, pdfpage, annot_type);
+                Py_ssize_t i, len = PySequence_Size(quads);
+                for (i = 0; i < len; i++) {
+                    item = PySequence_ITEM(quads, i);
+                    q = JM_quad_from_py(item);
+                    Py_DECREF(item);
+                    pdf_add_annot_quad_point(gctx, annot, q);
+                }
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_always(gctx) {
+                if (rotation != 0) {
+                    pdf_dict_put_int(gctx, pdfpage->obj, PDF_NAME(Rotate), rotation);
+                }
+            }
+            fz_catch(gctx) {
+                pdf_drop_annot(gctx, annot);
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // page: add circle or rectangle annotation
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_square_or_circle, !result)
+        struct Annot *
+        _add_square_or_circle(PyObject *rect, int annot_type)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            fz_try(gctx) {
+                fz_rect r = JM_rect_from_py(rect);
+                if (fz_is_infinite_rect(r) || fz_is_empty_rect(r))
+                    THROWMSG("rect must be finite and not empty");
+                annot = pdf_create_annot(gctx, page, annot_type);
+                pdf_set_annot_rect(gctx, annot, r);
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // page: add multiline annotation
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_multiline, !result)
+        struct Annot *
+        _add_multiline(PyObject *points, int annot_type)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *annot = NULL;
+            fz_try(gctx) {
+                Py_ssize_t i, n = PySequence_Size(points);
+                if (n < 2) THROWMSG("bad list of points");
+                annot = pdf_create_annot(gctx, page, annot_type);
+                for (i = 0; i < n; i++) {
+                    PyObject *p = PySequence_ITEM(points, i);
+                    if (PySequence_Size(p) != 2) {
+                        Py_DECREF(p);
+                        THROWMSG("bad list of points");
+                    }
+                    fz_point point = JM_point_from_py(p);
+                    Py_DECREF(p);
+                    pdf_add_annot_vertex(gctx, annot, point);
+                }
+
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // page addFreetextAnnot
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_add_freetext_annot, !result)
+        struct Annot *
+        _add_freetext_annot(PyObject *rect, char *text,
+            float fontsize=11,
+            char *fontname=NULL,
+            PyObject *text_color=NULL,
+            PyObject *fill_color=NULL,
+            int align=0,
+            int rotate=0)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            float fcol[4] = {1, 1, 1, 1}; // fill color: white
+            int nfcol = 0;
+            JM_color_FromSequence(fill_color, &nfcol, fcol);
+            float tcol[4] = {0, 0, 0, 0}; // std. text color: black
+            int ntcol = 0;
+            JM_color_FromSequence(text_color, &ntcol, tcol);
+            fz_rect r = JM_rect_from_py(rect);
+            pdf_annot *annot = NULL;
+            fz_try(gctx) {
+                if (fz_is_infinite_rect(r) || fz_is_empty_rect(r))
+                    THROWMSG("rect must be finite and not empty");
+                annot = pdf_create_annot(gctx, page, PDF_ANNOT_FREE_TEXT);
+                pdf_set_annot_contents(gctx, annot, text);
+                pdf_set_annot_rect(gctx, annot, r);
+                pdf_dict_put_int(gctx, annot->obj, PDF_NAME(Rotate), rotate);
+                pdf_dict_put_int(gctx, annot->obj, PDF_NAME(Q), align);
+
+                if (fill_color) {
+                    pdf_set_annot_color(gctx, annot, nfcol, fcol);
+                }
+
+                // insert the default appearance string
+                JM_make_annot_DA(gctx, annot, ntcol, tcol, fontname, fontsize);
+                JM_add_annot_id(gctx, annot, "fitzannot");
+                pdf_update_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+    %pythoncode %{
+        @property
+        def rotationMatrix(self):
+            """Reflects page rotation."""
+            return Matrix(TOOLS._rotate_matrix(self))
+
+        @property
+        def derotationMatrix(self):
+            """Reflects page de-rotation."""
+            return Matrix(TOOLS._derotate_matrix(self))
+
+        def addCaretAnnot(self, point):
+            """Add a 'Caret' annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_caret_annot(point)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addStrikeoutAnnot(self, quads=None, start=None, stop=None, clip=None):
+            """Add a 'StrikeOut' annotation."""
+            if quads is None:
+                q = get_highlight_selection(self, start=start, stop=stop, clip=clip)
+            else:
+                q = CheckMarkerArg(quads)
+            return self._add_text_marker(q, PDF_ANNOT_STRIKE_OUT)
+
+
+        def addUnderlineAnnot(self, quads=None, start=None, stop=None, clip=None):
+            """Add a 'Underline' annotation."""
+            if quads is None:
+                q = get_highlight_selection(self, start=start, stop=stop, clip=clip)
+            else:
+                q = CheckMarkerArg(quads)
+            return self._add_text_marker(q, PDF_ANNOT_UNDERLINE)
+
+
+        def addSquigglyAnnot(self, quads=None, start=None,
+                             stop=None, clip=None):
+            """Add a 'Squiggly' annotation."""
+            if quads is None:
+                q = get_highlight_selection(self, start=start, stop=stop, clip=clip)
+            else:
+                q = CheckMarkerArg(quads)
+            return self._add_text_marker(q, PDF_ANNOT_SQUIGGLY)
+
+
+        def addHighlightAnnot(self, quads=None, start=None,
+                              stop=None, clip=None):
+            """Add a 'Highlight' annotation."""
+            if quads is None:
+                q = get_highlight_selection(self, start=start, stop=stop, clip=clip)
+            else:
+                q = CheckMarkerArg(quads)
+            return self._add_text_marker(q, PDF_ANNOT_HIGHLIGHT)
+
+
+        def addRectAnnot(self, rect):
+            """Add a 'Square' (rectangle) annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_square_or_circle(rect, PDF_ANNOT_SQUARE)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addCircleAnnot(self, rect):
+            """Add a 'Circle' (ellipse, oval) annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_square_or_circle(rect, PDF_ANNOT_CIRCLE)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addTextAnnot(self, point, text, icon="Note"):
+            """Add a 'Text' (sticky note) annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_text_annot(point, text, icon=icon)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addLineAnnot(self, p1, p2):
+            """Add a 'Line' annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_line_annot(p1, p2)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addPolylineAnnot(self, points):
+            """Add a 'PolyLine' annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_multiline(points, PDF_ANNOT_POLY_LINE)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addPolygonAnnot(self, points):
+            """Add a 'Polygon' annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_multiline(points, PDF_ANNOT_POLYGON)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addStampAnnot(self, rect, stamp=0):
+            """Add a ('rubber') 'Stamp' annotation."""
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_stamp_annot(rect, stamp)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addInkAnnot(self, handwriting):
+            """Add a 'Ink' ('handwriting') annotation.
+
+            The argument must be a list of lists of point_likes.
+            """
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_ink_annot(handwriting)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addFileAnnot(self, point,
+            buffer,
+            filename,
+            ufilename=None,
+            desc=None,
+            icon=None):
+            """Add a 'FileAttachment' annotation."""
+
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_file_annot(point,
+                            buffer,
+                            filename,
+                            ufilename=ufilename,
+                            desc=desc,
+                            icon=icon)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addFreetextAnnot(self, rect, text, fontsize=12,
+                             fontname=None, text_color=None,
+                             fill_color=None, align=0, rotate=0):
+            """Add a 'FreeText' annotation."""
+
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_freetext_annot(rect, text, fontsize=fontsize,
+                        fontname=fontname, text_color=text_color,
+                        fill_color=fill_color, align=align, rotate=rotate)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            return annot
+
+
+        def addRedactAnnot(self, quad, text=None, fontname=None,
+                           fontsize=11, align=0, fill=None, text_color=None,
+                           cross_out=True):
+            """Add a 'Redact' annotation."""
+            da_str = None
+            if text:
+                CheckColor(fill)
+                CheckColor(text_color)
+                if not fontname:
+                    fontname = "Helv"
+                if not fontsize:
+                    fontsize = 11
+                if not text_color:
+                    text_color = (0, 0, 0)
+                if hasattr(text_color, "__float__"):
+                    text_color = (text_color, text_color, text_color)
+                if len(text_color) > 3:
+                    text_color = text_color[:3]
+                fmt = "{:g} {:g} {:g} rg /{f:s} {s:g} Tf"
+                da_str = fmt.format(*text_color, f=fontname, s=fontsize)
+                if fill is None:
+                    fill = (1, 1, 1)
+                if fill:
+                    if hasattr(fill, "__float__"):
+                        fill = (fill, fill, fill)
+                    if len(fill) > 3:
+                        fill = fill[:3]
+
+            old_rotation = annot_preprocess(self)
+            try:
+                annot = self._add_redact_annot(quad, text=text, da_str=da_str,
+                           align=align, fill=fill)
+            finally:
+                if old_rotation != 0:
+                    self.setRotation(old_rotation)
+            annot_postprocess(self, annot)
+            #------------------------------------------------------------------
+            # change the generated appearance to show a crossed-out rectangle
+            #------------------------------------------------------------------
+            if cross_out:
+                ap_tab = annot._getAP().splitlines()[:-1]  # get the 4 commands only
+                _, LL, LR, UR, UL = ap_tab
+                ap_tab.append(LR)
+                ap_tab.append(LL)
+                ap_tab.append(UR)
+                ap_tab.append(LL)
+                ap_tab.append(UL)
+                ap_tab.append(b"S")
+                ap = b"\n".join(ap_tab)
+                annot._setAP(ap, 0)
+            return annot
+        %}
+
+
+        //---------------------------------------------------------------------
+        // page load annot by name or xref
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_load_annot, !result)
+        struct Annot *
+        _load_annot(char *name, int xref)
+        {
+            pdf_annot *annot = NULL;
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                if (xref == 0)
+                    annot = JM_get_annot_by_name(gctx, page, name);
+                else
+                    annot = JM_get_annot_by_xref(gctx, page, xref);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // page get list of annot names
+        //---------------------------------------------------------------------
+        PARENTCHECK(annot_names, """List of names of annotations, fields and links.""")
+        PyObject *annot_names()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page) return_none;
+            return JM_get_annot_id_list(gctx, page);
+        }
+
+
+        //---------------------------------------------------------------------
+        // page retrieve list of annotation xrefs
+        //---------------------------------------------------------------------
+        PARENTCHECK(annot_xrefs,"""List of xref numbers of annotations, fields and links.""")
+        PyObject *annot_xrefs()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page) return_none;
+            return JM_get_annot_xref_list(gctx, page);
+        }
+
+
+        %pythoncode %{
+        def loadAnnot(self, ident):
+            """Load an annot by name (/NM key) or xref.
+            
+            Args:
+                ident: identifier, either name (str) or xref (int).
+            """
+
+            CheckParent(self)
+            if type(ident) is str:
+                xref = 0
+                name = ident
+            elif type(ident) is int:
+                xref = ident
+                name = None
+            else:
+                raise ValueError("identifier must be string or integer")
+            val = self._load_annot(name, xref)
+            if not val:
+                return val
+            val.thisown = True
+            val.parent = weakref.proxy(self)
+            self._annot_refs[id(val)] = val
+            return val
+
+        load_annot = loadAnnot
+
+
+        #---------------------------------------------------------------------
+        # page addWidget
+        #---------------------------------------------------------------------
+        def addWidget(self, widget):
+            """Add a 'Widget' (form field)."""
+            CheckParent(self)
+            doc = self.parent
+            if not doc.isPDF:
+                raise ValueError("not a PDF")
+            widget._validate()
+            annot = self._addWidget(widget.field_type, widget.field_name)
+            if not annot:
+                return None
+            annot.thisown = True
+            annot.parent = weakref.proxy(self) # owning page object
+            self._annot_refs[id(annot)] = annot
+            widget.parent = annot.parent
+            widget._annot = annot
+            widget.update()
+            return annot
+        %}
+
+        FITZEXCEPTION(_addWidget, !result)
+        struct Annot *_addWidget(int field_type, char *field_name)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_document *pdf = page->doc;
+            pdf_annot *annot = NULL;
+            fz_var(annot);
+            fz_try(gctx) {
+                annot = JM_create_widget(gctx, pdf, page, field_type, field_name);
+                if (!annot) THROWMSG("could not create widget");
+                JM_add_annot_id(gctx, annot, "fitzwidget");
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            annot = pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // Page.getDisplayList
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(getDisplayList, !result)
+        %pythonprepend getDisplayList %{
+        """Make a DisplayList from the page for Pixmap generation.
+
+        Include (default) or exclude annotations."""
+
+        CheckParent(self)
+        %}
+        struct DisplayList *getDisplayList(int annots=1)
+        {
+            fz_display_list *dl = NULL;
+            fz_try(gctx) {
+                if (annots) {
+                    dl = fz_new_display_list_from_page(gctx, (fz_page *) $self);
+                } else {
+                    dl = fz_new_display_list_from_page_contents(gctx, (fz_page *) $self);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct DisplayList *) dl;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Page apply redactions
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_apply_redactions, !result)
+        PyObject *_apply_redactions()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            int success = 0;
+            pdf_redact_options opts;
+            opts.no_black_boxes = 1;  // no black boxes
+            opts.keep_images = 0;  // do not keep images
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                success = pdf_redact_page(gctx, page->doc, page, &opts);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return JM_BOOL(success);
+        }
+
+
+        //---------------------------------------------------------------------
+        // Page._makePixmap
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_makePixmap, !result)
+        struct Pixmap *
+        _makePixmap(struct Document *doc,
+            PyObject *ctm,
+            struct Colorspace *cs,
+            int alpha=0,
+            int annots=1,
+            PyObject *clip=NULL)
+        {
+            fz_pixmap *pix = NULL;
+            fz_try(gctx) {
+                pix = JM_pixmap_from_page(gctx, (fz_document *) doc, (fz_page *) $self, ctm, (fz_colorspace *) cs, alpha, annots, clip);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pix;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Page.setMediaBox
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setMediaBox, !result)
+        PARENTCHECK(setMediaBox, """Set the MediaBox.""")
+        PyObject *setMediaBox(PyObject *rect)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                fz_rect mediabox = JM_rect_from_py(rect);
+                if (fz_is_empty_rect(mediabox) ||
+                    fz_is_infinite_rect(mediabox)) {
+                    THROWMSG("rect must be finite and not empty");
+                }
+                pdf_dict_put_rect(gctx, page->obj, PDF_NAME(MediaBox), mediabox);
+                pdf_dict_put_rect(gctx, page->obj, PDF_NAME(CropBox), mediabox);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            page->doc->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Page.setCropBox
+        // ATTENTION: This will also change the value returned by Page.bound()
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setCropBox, !result)
+        PARENTCHECK(setCropBox, """Set the CropBox.""")
+        PyObject *setCropBox(PyObject *rect)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                fz_rect mediabox = pdf_bound_page(gctx, page);
+                pdf_obj *o = pdf_dict_get_inheritable(gctx, page->obj, PDF_NAME(MediaBox));
+                if (o) mediabox = pdf_to_rect(gctx, o);
+                fz_rect cropbox = fz_empty_rect;
+                fz_rect r = JM_rect_from_py(rect);
+                cropbox.x0 = r.x0;
+                cropbox.y0 = mediabox.y1 - r.y1;
+                cropbox.x1 = r.x1;
+                cropbox.y1 = mediabox.y1 - r.y0;
+                pdf_dict_put_drop(gctx, page->obj, PDF_NAME(CropBox),
+                                  pdf_new_rect(gctx, page->doc, cropbox));
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            page->doc->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // loadLinks()
+        //---------------------------------------------------------------------
+        PARENTCHECK(loadLinks, """Get first Link.""")
+        %pythonappend loadLinks %{
+            if val:
+                val.thisown = True
+                val.parent = weakref.proxy(self) # owning page object
+                self._annot_refs[id(val)] = val
+                if self.parent.isPDF:
+                    val.xref = self._getLinkXrefs()[0]
+                else:
+                    val.xref = 0
+        %}
+        struct Link *loadLinks()
+        {
+            fz_link *l = NULL;
+            fz_try(gctx) {
+                l = fz_load_links(gctx, (fz_page *) $self);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Link *) l;
+        }
+        %pythoncode %{firstLink = property(loadLinks, doc="First link on page")%}
+
+        //---------------------------------------------------------------------
+        // firstAnnot
+        //---------------------------------------------------------------------
+        PARENTCHECK(firstAnnot, """First annotation.""")
+        %pythonappend firstAnnot %{
+        if val:
+            val.thisown = True
+            val.parent = weakref.proxy(self) # owning page object
+            self._annot_refs[id(val)] = val
+        %}
+        %pythoncode %{@property%}
+        struct Annot *firstAnnot()
+        {
+            pdf_annot *annot = NULL;
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (page)
+            {
+                annot = pdf_first_annot(gctx, page);
+                if (annot) pdf_keep_annot(gctx, annot);
+            }
+            return (struct Annot *) annot;
+        }
+
+        //---------------------------------------------------------------------
+        // firstWidget
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(firstWidget, """First widget/field.""")
+        %pythonappend firstWidget %{
+        if val:
+            val.thisown = True
+            val.parent = weakref.proxy(self) # owning page object
+            self._annot_refs[id(val)] = val
+            widget = Widget()
+            TOOLS._fill_widget(val, widget)
+            val = widget
+        %}
+        struct Annot *firstWidget()
+        {
+            pdf_annot *annot = NULL;
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (page)
+            {
+                annot = pdf_first_widget(gctx, page);
+                if (annot) pdf_keep_annot(gctx, annot);
+            }
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Page.deleteLink() - delete link
+        //---------------------------------------------------------------------
+        PARENTCHECK(deleteLink, """Delete a Link.""")
+        %pythonappend deleteLink
+%{if linkdict["xref"] == 0: return
+try:
+    linkid = linkdict["id"]
+    linkobj = self._annot_refs[linkid]
+    linkobj._erase()
+except:
+    pass
+%}
+        void deleteLink(PyObject *linkdict)
+        {
+            if (!PyDict_Check(linkdict)) return; // have no dictionary
+            fz_try(gctx) {
+                pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+                if (!page) goto finished;  // have no PDF
+                int xref = (int) PyInt_AsLong(PyDict_GetItem(linkdict, dictkey_xref));
+                if (xref < 1) goto finished;  // invalid xref
+                pdf_obj *annots = pdf_dict_get(gctx, page->obj, PDF_NAME(Annots));
+                if (!annots) goto finished;  // have no annotations
+                int len = pdf_array_len(gctx, annots);
+                int i, oxref = 0;
+
+                for (i = 0; i < len; i++) {
+                    oxref = pdf_to_num(gctx, pdf_array_get(gctx, annots, i));
+                    if (xref == oxref) break;        // found xref in annotations
+                }
+
+                if (xref != oxref) goto finished;  // xref not in annotations
+                pdf_array_delete(gctx, annots, i);   // delete entry in annotations
+                pdf_delete_object(gctx, page->doc, xref);      // delete link object
+                pdf_dict_put(gctx, page->obj, PDF_NAME(Annots), annots);
+                JM_refresh_link_table(gctx, page);            // reload link / annot tables
+                page->doc->dirty = 1;
+                finished:;
+            }
+            fz_catch(gctx) {;}
+        }
+
+        //---------------------------------------------------------------------
+        // Page.deleteAnnot() - delete annotation and return the next one
+        //---------------------------------------------------------------------
+        %pythonprepend deleteAnnot %{
+        """Delete annot and return next one."""
+        CheckParent(self)
+        CheckParent(annot)%}
+
+        %pythonappend deleteAnnot %{
+        if val:
+            val.thisown = True
+            val.parent = weakref.proxy(self) # owning page object
+            val.parent._annot_refs[id(val)] = val
+        annot._erase()
+        %}
+
+        struct Annot *deleteAnnot(struct Annot *annot)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_annot *irt_annot = NULL;
+            while (1)  // first loop through all /IRT annots and remove them
+            {
+                irt_annot = JM_find_annot_irt(gctx, (pdf_annot *) annot);
+                if (!irt_annot)  // no more there
+                    break;
+                JM_delete_annot(gctx, page, irt_annot);
+            }
+            pdf_annot *nextannot = pdf_next_annot(gctx, (pdf_annot *) annot);  // store next
+            JM_delete_annot(gctx, page, (pdf_annot *) annot);
+            if (nextannot)
+            {
+                nextannot = pdf_keep_annot(gctx, nextannot);
+            }
+            page->doc->dirty = 1;
+            return (struct Annot *) nextannot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // MediaBox: get the /MediaBox (PDF only)
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(MediaBox, """The MediaBox.""")
+        %pythonappend MediaBox %{val = Rect(val)%}
+        PyObject *MediaBox()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page)
+                return JM_py_from_rect(fz_bound_page(gctx, (fz_page *) $self));
+            return JM_py_from_rect(JM_mediabox(gctx, page));
+        }
+
+
+        //---------------------------------------------------------------------
+        // CropBox: get the /CropBox (PDF only)
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(CropBox, """The CropBox.""")
+        %pythonappend CropBox %{val = Rect(val)%}
+        PyObject *CropBox()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page)
+                return JM_py_from_rect(fz_bound_page(gctx, (fz_page *) $self));
+            return JM_py_from_rect(JM_cropbox(gctx, page));
+        }
+
+
+        //---------------------------------------------------------------------
+        // CropBox position: x0, y0 of /CropBox
+        //---------------------------------------------------------------------
+        %pythoncode %{
+        @property
+        def CropBoxPosition(self):
+            return self.CropBox.tl
+        %}
+
+
+        //---------------------------------------------------------------------
+        // rotation - return page rotation
+        //---------------------------------------------------------------------
+        PARENTCHECK(rotation, """Page rotation.""")
+        %pythoncode %{@property%}
+        int rotation()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page) return 0;
+            return JM_page_rotation(gctx, page);
+        }
+
+        /*********************************************************************/
+        // setRotation() - set page rotation
+        /*********************************************************************/
+        FITZEXCEPTION(setRotation, !result)
+        PARENTCHECK(setRotation, """Set page rotation.""")
+        PyObject *setRotation(int rotation)
+        {
+            fz_try(gctx) {
+                pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+                ASSERT_PDF(page);
+                int rot = JM_norm_rotation(rotation);
+                pdf_dict_put_int(gctx, page->obj, PDF_NAME(Rotate), (int64_t) rot);
+                page->doc->dirty = 1;
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        /*********************************************************************/
+        // Page._addAnnot_FromString
+        // Add new links provided as an array of string object definitions.
+        /*********************************************************************/
+        FITZEXCEPTION(_addAnnot_FromString, !result)
+        PARENTCHECK(_addAnnot_FromString, """Add Link/Annot from object source.""")
+        PyObject *_addAnnot_FromString(PyObject *linklist)
+        {
+            pdf_obj *annots, *annot, *ind_obj, *new_array;
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            PyObject *txtpy;
+            char *text;
+            int lcount = (int) PySequence_Size(linklist); // new object count
+            if (lcount < 1) return_none;
+            int i;
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                // get existing annots array
+                annots = pdf_dict_get(gctx, page->obj, PDF_NAME(Annots));
+                if (annots) {
+                    new_array = annots;
+                } else {
+                    new_array = pdf_new_array(gctx, page->doc, lcount);
+                    pdf_dict_put_drop(gctx, page->obj, PDF_NAME(Annots), new_array);
+                    new_array = pdf_dict_get(gctx, page->obj, PDF_NAME(Annots));
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+
+            // extract object sources from Python list and store as annotations
+            for (i = 0; i < lcount; i++) {
+                fz_try(gctx) {
+                    text = NULL;
+                    txtpy = PySequence_ITEM(linklist, (Py_ssize_t) i);
+                    text = JM_Python_str_AsChar(txtpy);
+                    if (!text) THROWMSG("non-string linklist item");
+                    annot = JM_pdf_obj_from_str(gctx, page->doc, text);
+                    JM_Python_str_DelForPy3(text);
+                    ind_obj = pdf_add_object(gctx, page->doc, annot);
+                    pdf_array_push_drop(gctx, new_array, ind_obj);
+                    pdf_drop_obj(gctx, annot);
+                }
+                fz_catch(gctx) {
+                    if (text)
+                        PySys_WriteStderr("%s (%i): '%s'\n", fz_caught_message(gctx), i, text);
+                    else
+                        PySys_WriteStderr("%s (%i)\n", fz_caught_message(gctx), i);
+                    JM_Python_str_DelForPy3(text);
+                    PyErr_Clear();
+                }
+            }
+            fz_try(gctx) {
+                JM_refresh_link_table(gctx, page);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            page->doc->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Page._getLinkXrefs - get list of link xref numbers.
+        // return_none for non-PDF
+        //---------------------------------------------------------------------
+        PyObject *_getLinkXrefs()
+        {
+            pdf_obj *annots, *annots_arr, *link, *obj;
+            int i, lcount;
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            PyObject *linkxrefs = PyList_New(0);
+            if (!page) return linkxrefs;  // empty list for non-PDF
+            annots = pdf_dict_get(gctx, page->obj, PDF_NAME(Annots));
+            if (!annots) return linkxrefs;  // no links on this page
+            if (pdf_is_indirect(gctx, annots))
+                annots_arr = pdf_resolve_indirect(gctx, annots);
+            else
+                annots_arr = annots;
+            lcount = pdf_array_len(gctx, annots_arr);
+            for (i = 0; i < lcount; i++)
+            {
+                link = pdf_array_get(gctx, annots_arr, i);
+                obj = pdf_dict_get(gctx, link, PDF_NAME(Subtype));
+                if (pdf_name_eq(gctx, obj, PDF_NAME(Link)))
+                {
+                    LIST_APPEND_DROP(linkxrefs, Py_BuildValue("i", pdf_to_num(gctx, link)));
+                }
+            }
+            return linkxrefs;
+        }
+
+        //---------------------------------------------------------------------
+        // Page clean contents stream
+        //---------------------------------------------------------------------
+        PARENTCHECK(_cleanContents, """Clean page /Contents object(s).""")
+        PyObject *_cleanContents()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page)
+            {
+                return_none;
+            }
+            pdf_filter_options filter = {
+                NULL,  // opaque
+                NULL,  // image filter
+                NULL,  // text filter
+                NULL,  // after text
+                NULL,  // end page
+                1,     // recurse: true
+                1,     // instance forms
+                1,     // sanitize plus filtering
+                0      // do not ascii-escape binary data
+                }; 
+            fz_try(gctx) {
+                pdf_filter_page_contents(gctx, page->doc, page, &filter);
+            }
+            fz_catch(gctx) {
+                return_none;
+            }
+            page->doc->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Show a PDF page
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_showPDFpage, !result)
+        PyObject *_showPDFpage(struct Page *fz_srcpage, int overlay=1, PyObject *matrix=NULL, int xref=0, PyObject *clip = NULL, struct Graftmap *graftmap = NULL, char *_imgname = NULL)
+        {
+            pdf_obj *xobj1, *xobj2, *resources;
+            fz_buffer *res=NULL, *nres=NULL;
+            fz_rect cropbox = JM_rect_from_py(clip);
+            fz_matrix mat = JM_matrix_from_py(matrix);
+            int rc_xref = xref;
+            fz_try(gctx) {
+                pdf_page *tpage = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+                pdf_obj *tpageref = tpage->obj;
+                pdf_document *pdfout = tpage->doc;    // target PDF
+
+                //-------------------------------------------------------------
+                // convert the source page to a Form XObject
+                //-------------------------------------------------------------
+                xobj1 = JM_xobject_from_page(gctx, pdfout, (fz_page *) fz_srcpage,
+                                             xref, (pdf_graft_map *) graftmap);
+                if (!rc_xref) rc_xref = pdf_to_num(gctx, xobj1);
+
+                //-------------------------------------------------------------
+                // create referencing XObject (controls display on target page)
+                //-------------------------------------------------------------
+                // fill reference to xobj1 into the /Resources
+                //-------------------------------------------------------------
+                pdf_obj *subres1 = pdf_new_dict(gctx, pdfout, 5);
+                pdf_dict_puts(gctx, subres1, "fullpage", xobj1);
+                pdf_obj *subres  = pdf_new_dict(gctx, pdfout, 5);
+                pdf_dict_put_drop(gctx, subres, PDF_NAME(XObject), subres1);
+
+                res = fz_new_buffer(gctx, 20);
+                fz_append_string(gctx, res, "/fullpage Do");
+
+                xobj2 = pdf_new_xobject(gctx, pdfout, cropbox, mat, subres, res);
+
+                pdf_drop_obj(gctx, subres);
+                fz_drop_buffer(gctx, res);
+
+                //-------------------------------------------------------------
+                // update target page with xobj2:
+                //-------------------------------------------------------------
+                // 1. insert Xobject in Resources
+                //-------------------------------------------------------------
+                resources = pdf_dict_get_inheritable(gctx, tpageref, PDF_NAME(Resources));
+                subres = pdf_dict_get(gctx, resources, PDF_NAME(XObject));
+                if (!subres) {
+                    subres = pdf_new_dict(gctx, pdfout, 10);
+                    pdf_dict_putl(gctx, tpageref, subres, PDF_NAME(Resources), PDF_NAME(XObject), NULL);
+                }
+
+                pdf_dict_puts(gctx, subres, _imgname, xobj2);
+
+                //-------------------------------------------------------------
+                // 2. make and insert new Contents object
+                //-------------------------------------------------------------
+                nres = fz_new_buffer(gctx, 50);       // buffer for Do-command
+                fz_append_string(gctx, nres, " q /");    // Do-command
+                fz_append_string(gctx, nres, _imgname);
+                fz_append_string(gctx, nres, " Do Q ");
+
+                JM_insert_contents(gctx, pdfout, tpageref, nres, overlay);
+                fz_drop_buffer(gctx, nres);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("i", rc_xref);
+        }
+
+        //---------------------------------------------------------------------
+        // insert an image
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_insertImage, !result)
+        PyObject *_insertImage(const char *filename=NULL, struct Pixmap *pixmap=NULL, PyObject *stream=NULL, int overlay=1, PyObject *matrix=NULL,
+        const char *_imgname=NULL, PyObject *_imgpointer=NULL)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_document *pdf;
+            fz_pixmap *pm = NULL;
+            fz_pixmap *pix = NULL;
+            fz_image *mask = NULL;
+            fz_separations *seps = NULL;
+            pdf_obj *resources, *xobject, *ref;
+            fz_buffer *nres = NULL,  *imgbuf = NULL;
+            fz_matrix mat = JM_matrix_from_py(matrix); // pre-calculated
+
+            const char *template = " q %g %g %g %g %g %g cm /%s Do Q ";
+            fz_image *zimg = NULL, *image = NULL;
+            fz_try(gctx) {
+                //-------------------------------------------------------------
+                // create the image
+                //-------------------------------------------------------------
+                if (filename || EXISTS(stream) || EXISTS(_imgpointer))
+                {
+                    if (filename) {
+                        image = fz_new_image_from_file(gctx, filename);
+                    } else if (EXISTS(stream)) {
+                        imgbuf = JM_BufferFromBytes(gctx, stream);
+                        image = fz_new_image_from_buffer(gctx, imgbuf);
+                    } else {  // fz_image pointer has been handed in
+                        image = (fz_image *)PyLong_AsVoidPtr(_imgpointer);
+                    }
+
+                    // test for alpha (which would require making an SMask)
+                    pix = fz_get_pixmap_from_image(gctx, image, NULL, NULL, 0, 0);
+                    int xres, yres;
+                    fz_image_resolution(image, &xres, &yres);
+                    pix->xres = xres;
+                    pix->yres = yres;
+                    if (pix->alpha == 1) {  // have alpha: create an SMask
+                        pm = fz_convert_pixmap(gctx, pix, NULL, NULL, NULL, fz_default_color_params, 1);
+                        pm->alpha = 0;
+                        pm->colorspace = fz_keep_colorspace(gctx, fz_device_gray(gctx));
+                        mask = fz_new_image_from_pixmap(gctx, pm, NULL);
+                        zimg = fz_new_image_from_pixmap(gctx, pix, mask);
+                        fz_drop_image(gctx, image);
+                        image = zimg;
+                        zimg = NULL;
+                    }
+                } else {  // pixmap specified
+                    fz_pixmap *arg_pix = (fz_pixmap *) pixmap;
+                    if (arg_pix->alpha == 0) {
+                        image = fz_new_image_from_pixmap(gctx, arg_pix, NULL);
+                    } else {  // pixmap has alpha: create an SMask
+                        pm = fz_convert_pixmap(gctx, arg_pix, NULL, NULL, NULL, fz_default_color_params, 1);
+                        pm->alpha = 0;
+                        pm->colorspace = fz_keep_colorspace(gctx, fz_device_gray(gctx));
+                        mask = fz_new_image_from_pixmap(gctx, pm, NULL);
+                        image = fz_new_image_from_pixmap(gctx, arg_pix, mask);
+                    }
+                }
+
+                //-------------------------------------------------------------
+                // image created - now put it in the PDF
+                //-------------------------------------------------------------
+                pdf = page->doc;  // owning PDF
+
+                // get /Resources, /XObject
+                resources = pdf_dict_get_inheritable(gctx, page->obj, PDF_NAME(Resources));
+                xobject = pdf_dict_get(gctx, resources, PDF_NAME(XObject));
+                if (!xobject) {  // has no XObject yet, create one
+                    xobject = pdf_new_dict(gctx, pdf, 10);
+                    pdf_dict_putl_drop(gctx, page->obj, xobject, PDF_NAME(Resources), PDF_NAME(XObject), NULL);
+                }
+
+                ref = pdf_add_image(gctx, pdf, image);
+                pdf_dict_puts(gctx, xobject, _imgname, ref);  // update XObject
+
+                // make contents stream that invokes the image
+                nres = fz_new_buffer(gctx, 50);
+                fz_append_printf(gctx, nres, template,
+                                 mat.a, mat.b, mat.c, mat.d, mat.e, mat.f,
+                                 _imgname);
+                JM_insert_contents(gctx, pdf, page->obj, nres, overlay);
+                fz_drop_buffer(gctx, nres);
+            }
+            fz_always(gctx) {
+                fz_drop_image(gctx, image);
+                fz_drop_image(gctx, mask);
+                fz_drop_pixmap(gctx, pix);
+                fz_drop_pixmap(gctx, pm);
+                fz_drop_buffer(gctx, imgbuf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Page.refresh()
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(refresh, !result)
+        PARENTCHECK(refresh, """Refresh page after link/annot/widget updates.""")
+        PyObject *refresh()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page) return_none;
+            fz_try(gctx) {
+                JM_refresh_link_table(gctx, page);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        //---------------------------------------------------------------------
+        // insert font
+        //---------------------------------------------------------------------
+        %pythoncode
+%{
+def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None,
+               set_simple=False, wmode=0, encoding=0):
+    doc = self.parent
+    if doc is None:
+        raise ValueError("orphaned object: parent is None")
+    idx = 0
+
+    if fontname.startswith("/"):
+        fontname = fontname[1:]
+
+    font = CheckFont(self, fontname)
+    if font is not None:                    # font already in font list of page
+        xref = font[0]                      # this is the xref
+        if CheckFontInfo(doc, xref):        # also in our document font list?
+            return xref                     # yes: we are done
+        # need to build the doc FontInfo entry - done via getCharWidths
+        doc.getCharWidths(xref)
+        return xref
+
+    #--------------------------------------------------------------------------
+    # the font is not present for this page
+    #--------------------------------------------------------------------------
+
+    bfname = Base14_fontdict.get(fontname.lower(), None) # BaseFont if Base-14 font
+
+    serif = 0
+    CJK_number = -1
+    CJK_list_n = ["china-t", "china-s", "japan", "korea"]
+    CJK_list_s = ["china-ts", "china-ss", "japan-s", "korea-s"]
+
+    try:
+        CJK_number = CJK_list_n.index(fontname)
+        serif = 0
+    except:
+        pass
+
+    if CJK_number < 0:
+        try:
+            CJK_number = CJK_list_s.index(fontname)
+            serif = 1
+        except:
+            pass
+
+    # install the font for the page
+    val = self._insertFont(fontname, bfname, fontfile, fontbuffer, set_simple, idx,
+                           wmode, serif, encoding, CJK_number)
+
+    if not val:                   # did not work, error return
+        return val
+
+    xref = val[0]                 # xref of installed font
+
+    if CheckFontInfo(doc, xref):  # check again: document already has this font
+        return xref               # we are done
+
+    # need to create document font info
+    doc.getCharWidths(xref)
+    return xref
+
+%}
+
+        FITZEXCEPTION(_insertFont, !result)
+        PyObject *_insertFont(char *fontname, char *bfname,
+                             char *fontfile,
+                             PyObject *fontbuffer,
+                             int set_simple, int idx,
+                             int wmode, int serif,
+                             int encoding, int ordering)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_document *pdf;
+            pdf_obj *resources, *fonts, *font_obj;
+            fz_font *font = NULL;
+            fz_buffer *res = NULL;
+            const unsigned char *data = NULL;
+            int size, ixref = 0, index = 0, simple = 0;
+            PyObject *value;
+            PyObject *exto = NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                pdf = page->doc;
+                // get the objects /Resources, /Resources/Font
+                resources = pdf_dict_get_inheritable(gctx, page->obj, PDF_NAME(Resources));
+                fonts = pdf_dict_get(gctx, resources, PDF_NAME(Font));
+                if (!fonts) {  // page has no fonts yet
+                    fonts = pdf_new_dict(gctx, pdf, 10);
+                    pdf_dict_putl_drop(gctx, page->obj, fonts, PDF_NAME(Resources), PDF_NAME(Font), NULL);
+                }
+
+                //-------------------------------------------------------------
+                // check for CJK font
+                //-------------------------------------------------------------
+                if (ordering > -1) data = fz_lookup_cjk_font(gctx, ordering, &size, &index);
+                if (data) {
+                    font = fz_new_font_from_memory(gctx, NULL, data, size, index, 0);
+                    font_obj = pdf_add_cjk_font(gctx, pdf, font, ordering, wmode, serif);
+                    exto = JM_UnicodeFromStr("n/a");
+                    simple = 0;
+                    goto weiter;
+                }
+
+                //-------------------------------------------------------------
+                // check for PDF Base-14 font
+                //-------------------------------------------------------------
+                if (bfname) data = fz_lookup_base14_font(gctx, bfname, &size);
+                if (data) {
+                    font = fz_new_font_from_memory(gctx, bfname, data, size, 0, 0);
+                    font_obj = pdf_add_simple_font(gctx, pdf, font, encoding);
+                    exto = JM_UnicodeFromStr("n/a");
+                    simple = 1;
+                    goto weiter;
+                }
+
+                if (fontfile) {
+                    font = fz_new_font_from_file(gctx, NULL, fontfile, idx, 0);
+                } else {
+                    res = JM_BufferFromBytes(gctx, fontbuffer);
+                    if (!res) THROWMSG("need one of fontfile, fontbuffer");
+                    font = fz_new_font_from_buffer(gctx, NULL, res, idx, 0);
+                }
+
+                if (!set_simple) {
+                    font_obj = pdf_add_cid_font(gctx, pdf, font);
+                    simple = 0;
+                } else {
+                    font_obj = pdf_add_simple_font(gctx, pdf, font, encoding);
+                    simple = 2;
+                }
+
+                weiter: ;
+                ixref = pdf_to_num(gctx, font_obj);
+
+                PyObject *name = JM_EscapeStrFromStr(pdf_to_name(gctx,
+                            pdf_dict_get(gctx, font_obj, PDF_NAME(BaseFont))));
+
+                PyObject *subt = JM_UnicodeFromStr(pdf_to_name(gctx,
+                            pdf_dict_get(gctx, font_obj, PDF_NAME(Subtype))));
+
+                if (!exto)
+                    exto = JM_UnicodeFromStr(JM_get_fontextension(gctx, pdf, ixref));
+
+                value = Py_BuildValue("[i, {s:O, s:O, s:O, s:O, s:i}]",
+                                      ixref,
+                                      "name", name,        // base font name
+                                      "type", subt,        // subtype
+                                      "ext", exto,         // file extension
+                                      "simple", JM_BOOL(simple), // simple font?
+                                      "ordering", ordering); // CJK font?
+                Py_CLEAR(exto);
+                Py_CLEAR(name);
+                Py_CLEAR(subt);
+
+                // store font in resources and fonts objects will contain named reference to font
+                pdf_dict_puts_drop(gctx, fonts, fontname, font_obj);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                fz_drop_font(gctx, font);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return value;
+        }
+
+        //---------------------------------------------------------------------
+        // Get page transformation matrix
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(transformationMatrix, """Page transformation matrix.""")
+        %pythonappend transformationMatrix %{
+        if self.rotation % 360 == 0:
+            val = Matrix(val)
+        else:
+            val = Matrix(1, 0, 0, -1, 0, self.CropBox.height)
+        %}
+        PyObject *transformationMatrix()
+        {
+            fz_matrix ctm = fz_identity;
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            if (!page) return JM_py_from_matrix(ctm);
+            fz_try(gctx) {
+                pdf_page_transform(gctx, page, NULL, &ctm);
+            }
+            fz_catch(gctx) {;}
+            return JM_py_from_matrix(ctm);
+        }
+
+        //---------------------------------------------------------------------
+        // Page Get list of contents objects
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getContents, !result)
+        PARENTCHECK(_getContents, """Get xref list of /Contents objects.""")
+        PyObject *_getContents()
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            PyObject *list = NULL;
+            pdf_obj *contents = NULL, *icont = NULL;
+            int i, xref;
+            size_t n = 0;
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                contents = pdf_dict_get(gctx, page->obj, PDF_NAME(Contents));
+                if (pdf_is_array(gctx, contents)) {
+                    n = pdf_array_len(gctx, contents);
+                    list = PyList_New(n);
+                    for (i = 0; i < n; i++) {
+                        icont = pdf_array_get(gctx, contents, i);
+                        xref = pdf_to_num(gctx, icont);
+                        PyList_SET_ITEM(list, i, Py_BuildValue("i", xref));
+                    }
+                }
+                else if (contents) {
+                    list = PyList_New(1);
+                    xref = pdf_to_num(gctx, contents);
+                    PyList_SET_ITEM(list, 0, Py_BuildValue("i", xref));
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            if (list) {
+                return list;
+            }
+            return PyList_New(0);
+        }
+
+        //---------------------------------------------------------------------
+        // Set given object as the /Contents of a page
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_setContents, !result)
+        PARENTCHECK(_setContents, """Set bytes as the (only) /Contents object.""")
+        PyObject *_setContents(int xref = 0)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self);
+            pdf_obj *contents = NULL;
+
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+
+                if (!INRANGE(xref, 1, pdf_xref_len(gctx, page->doc) - 1))
+                    THROWMSG("xref out of range");
+
+                contents = pdf_new_indirect(gctx, page->doc, xref, 0);
+                if (!pdf_is_stream(gctx, contents))
+                    THROWMSG("xref is not a stream");
+
+                pdf_dict_put_drop(gctx, page->obj, PDF_NAME(Contents), contents);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            page->doc->dirty = 1;
+            return_none;
+        }
+
+        %pythoncode %{
+        @property
+        def _isWrapped(self):
+            """Check if /Contents is wrapped in string pair "q" / "Q".
+            """
+            cont = self.readContents().split()
+            if len(cont) < 1 or cont[0] != b"q" or cont[-1] != "Q":
+                return False
+            return True
+
+        def _wrapContents(self):
+            TOOLS._insert_contents(self, b"q\n", False)
+            TOOLS._insert_contents(self, b"\nQ", True)
+
+        wrapContents = _wrapContents
+
+
+        def links(self, kinds=None):
+            """ Generator over the links of a page.
+
+            Args:
+                kinds: (tuple) link kinds to subselect from. If none,
+                        all links are returned. E.g. kinds=(LINK_URI,)
+                        will only yield URI links.
+            """
+            all_links = self.getLinks()
+            for link in all_links:
+                if kinds is None or link["kind"] in kinds:
+                    yield (link)
+
+
+        def annots(self, types=None):
+            """ Generator over the annotations of a page.
+
+            Args:
+                types: (tuple) annotation types to subselect from. If none,
+                        all annotations are returned. E.g. types=(PDF_ANNOT_LINE,)
+                        will only yield line annotations.
+            """
+            annot = self.firstAnnot
+            while annot:
+                if types is None or annot.type[0] in types:
+                    yield (annot)
+                annot = annot.next
+
+
+        def widgets(self, types=None):
+            """ Generator over the widgets of a page.
+
+            Args:
+                types: (tuple) field types to subselect from. If none,
+                        all fields are returned. E.g. types=(PDF_WIDGET_TYPE_TEXT,)
+                        will only yield text fields.
+            """
+            widget = self.firstWidget
+            while widget:
+                if types is None or widget.field_type in types:
+                    yield (widget)
+                widget = widget.next
+
+
+        def __str__(self):
+            CheckParent(self)
+            x = self.parent.name
+            if self.parent.stream is not None:
+                x = "<memory, doc# %i>" % (self.parent._graft_id,)
+            if x == "":
+                x = "<new PDF, doc# %i>" % self.parent._graft_id
+            return "page %s of %s" % (self.number, x)
+
+        def __repr__(self):
+            CheckParent(self)
+            x = self.parent.name
+            if self.parent.stream is not None:
+                x = "<memory, doc# %i>" % (self.parent._graft_id,)
+            if x == "":
+                x = "<new PDF, doc# %i>" % self.parent._graft_id
+            return "page %s of %s" % (self.number, x)
+
+        def _forget_annot(self, annot):
+            """Remove an annot from reference dictionary."""
+            aid = id(annot)
+            if aid in self._annot_refs:
+                self._annot_refs[aid] = None
+
+        def _reset_annot_refs(self):
+            """Invalidate / delete all annots of this page."""
+            for annot in self._annot_refs.values():
+                if annot:
+                    annot._erase()
+            self._annot_refs.clear()
+
+        @property
+        def xref(self):
+            """PDF xref number of page."""
+            CheckParent(self)
+            return self.parent._getPageXref(self.number)[0]
+
+        def _erase(self):
+            self._reset_annot_refs()
+            try:
+                self.parent._forget_page(self)
+            except:
+                pass
+            if getattr(self, "thisown", False):
+                self.__swig_destroy__(self)
+            self.parent = None
+            self.thisown = False
+            self.number = None
+
+        def __del__(self):
+            self._erase()
+
+        def getFontList(self, full=False):
+            """List of fonts defined in the page object."""
+            CheckParent(self)
+            return self.parent.getPageFontList(self.number, full=full)
+
+        def getImageList(self, full=False):
+            """List of images defined in the page object."""
+            CheckParent(self)
+            return self.parent.getPageImageList(self.number, full=full)
+
+
+        def readContents(self):
+            """All /Contents streams concatenated in one bytes object."""
+            return TOOLS._get_all_contents(self)
+
+
+        @property
+        def MediaBoxSize(self):
+            return Point(self.MediaBox.width, self.MediaBox.height)
+
+        def cleanContents(self):
+            self._cleanContents()
+
+        getContents = _getContents
+        %}
+    }
+};
+%clearnodefaultctor;
+
+//-----------------------------------------------------------------------------
+// Pixmap
+//-----------------------------------------------------------------------------
+struct Pixmap
+{
+    %extend {
+        ~Pixmap() {
+            DEBUGMSG1("Pixmap");
+            fz_pixmap *this_pix = (fz_pixmap *) $self;
+            fz_drop_pixmap(gctx, this_pix);
+            DEBUGMSG2;
+        }
+        FITZEXCEPTION(Pixmap, !result)
+        %pythonprepend Pixmap
+%{"""Pixmap(colorspace, irect, alpha) - empty pixmap.
+Pixmap(colorspace, src) - copy changing colorspace.
+Pixmap(src, width, height,[clip]) - scaled copy, float dimensions.
+Pixmap(src, alpha=1) - copy and add or drop alpha channel.
+Pixmap(filename) - from an image in a file.
+Pixmap(image) - from an image in memory (bytes).
+Pixmap(colorspace, width, height, samples, alpha) - from samples data.
+Pixmap(PDFdoc, xref) - from an image at xref in a PDF document.
+"""%}
+        //---------------------------------------------------------------------
+        // create empty pixmap with colorspace and IRect
+        //---------------------------------------------------------------------
+        Pixmap(struct Colorspace *cs, PyObject *bbox, int alpha = 0)
+        {
+            fz_pixmap *pm = NULL;
+            fz_try(gctx) {
+                pm = fz_new_pixmap_with_bbox(gctx, (fz_colorspace *) cs, JM_irect_from_py(bbox), NULL, alpha);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+        //---------------------------------------------------------------------
+        // copy pixmap, converting colorspace
+        // New in v1.11: option to remove alpha
+        // Changed in v1.13: alpha = 0 does not work since at least v1.12
+        //---------------------------------------------------------------------
+        Pixmap(struct Colorspace *cs, struct Pixmap *spix)
+        {
+            fz_pixmap *pm = NULL;
+            fz_try(gctx) {
+                if (!fz_pixmap_colorspace(gctx, (fz_pixmap *) spix))
+                    THROWMSG("cannot copy pixmap with NULL colorspace");
+                pm = fz_convert_pixmap(gctx, (fz_pixmap *) spix, (fz_colorspace *) cs, NULL, NULL, fz_default_color_params, 1);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+
+        //---------------------------------------------------------------------
+        // create pixmap as scaled copy of another one
+        //---------------------------------------------------------------------
+        Pixmap(struct Pixmap *spix, float w, float h, PyObject *clip=NULL)
+        {
+            fz_pixmap *pm = NULL;
+            fz_pixmap *src_pix = (fz_pixmap *) spix;
+            fz_try(gctx) {
+                fz_irect bbox = JM_irect_from_py(clip);
+                if (!fz_is_infinite_irect(bbox)) {
+                    pm = fz_scale_pixmap(gctx, src_pix, src_pix->x, src_pix->y, w, h, &bbox);
+                } else {
+                    pm = fz_scale_pixmap(gctx, src_pix, src_pix->x, src_pix->y, w, h, NULL);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+
+        //---------------------------------------------------------------------
+        // copy pixmap & add / drop the alpha channel
+        //---------------------------------------------------------------------
+        Pixmap(struct Pixmap *spix, int alpha=1)
+        {
+            fz_pixmap *pm = NULL, *src_pix = (fz_pixmap *) spix;
+            int n, w, h, i;
+            fz_separations *seps = NULL;
+            fz_try(gctx) {
+                if (!INRANGE(alpha, 0, 1))
+                    THROWMSG("illegal alpha value");
+                fz_colorspace *cs = fz_pixmap_colorspace(gctx, src_pix);
+                if (!cs && !alpha)
+                    THROWMSG("cannot drop alpha for 'NULL' colorspace");
+                n = fz_pixmap_colorants(gctx, src_pix);
+                w = fz_pixmap_width(gctx, src_pix);
+                h = fz_pixmap_height(gctx, src_pix);
+                pm = fz_new_pixmap(gctx, cs, w, h, seps, alpha);
+                pm->x = src_pix->x;
+                pm->y = src_pix->y;
+                pm->xres = src_pix->xres;
+                pm->yres = src_pix->yres;
+
+                // copy samples data ------------------------------------------
+                unsigned char *sptr = src_pix->samples;
+                unsigned char *tptr = pm->samples;
+                if (src_pix->alpha == pm->alpha) {  // identical samples
+                    memcpy(tptr, sptr, w * h * (n + alpha));
+                } else {
+                    for (i = 0; i < w * h; i++) {
+                        memcpy(tptr, sptr, n);
+                        tptr += n;
+                        if (pm->alpha) {
+                            tptr[0] = 255;
+                            tptr++;
+                        }
+                        sptr += n + src_pix->alpha;
+                    }
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+        //---------------------------------------------------------------------
+        // create pixmap from samples data
+        //---------------------------------------------------------------------
+        Pixmap(struct Colorspace *cs, int w, int h, PyObject *samples, int alpha=0)
+        {
+            int n = fz_colorspace_n(gctx, (fz_colorspace *) cs);
+            int stride = (n + alpha)*w;
+            fz_separations *seps = NULL;
+            fz_buffer *res = NULL;
+            fz_pixmap *pm = NULL;
+            fz_try(gctx) {
+                size_t size = 0;
+                unsigned char *c = NULL;
+                res = JM_BufferFromBytes(gctx, samples);
+                if (!res) THROWMSG("bad samples data");
+                size = fz_buffer_storage(gctx, res, &c);
+                if (stride * h != size) THROWMSG("bad samples length");
+                pm = fz_new_pixmap(gctx, (fz_colorspace *) cs, w, h, seps, alpha);
+                memcpy(pm->samples, c, size);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+        //---------------------------------------------------------------------
+        // create pixmap from filename
+        //---------------------------------------------------------------------
+        Pixmap(char *filename)
+        {
+            fz_image *img = NULL;
+            fz_pixmap *pm = NULL;
+            fz_try(gctx) {
+                img = fz_new_image_from_file(gctx, filename);
+                pm = fz_get_pixmap_from_image(gctx, img, NULL, NULL, NULL, NULL);
+                int xres, yres;
+                fz_image_resolution(img, &xres, &yres);
+                pm->xres = xres;
+                pm->yres = yres;
+            }
+            fz_always(gctx) {
+                fz_drop_image(gctx, img);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+        //---------------------------------------------------------------------
+        // create pixmap from in-memory image
+        //---------------------------------------------------------------------
+        Pixmap(PyObject *imagedata)
+        {
+            fz_buffer *res = NULL;
+            fz_image *img = NULL;
+            fz_pixmap *pm = NULL;
+            fz_try(gctx) {
+                res = JM_BufferFromBytes(gctx, imagedata);
+                if (!res) THROWMSG("bad image data");
+                img = fz_new_image_from_buffer(gctx, res);
+                pm = fz_get_pixmap_from_image(gctx, img, NULL, NULL, NULL, NULL);
+                int xres, yres;
+                fz_image_resolution(img, &xres, &yres);
+                pm->xres = xres;
+                pm->yres = yres;
+            }
+            fz_always(gctx) {
+                fz_drop_image(gctx, img);
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pm;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Create pixmap from PDF image identified by XREF number
+        //---------------------------------------------------------------------
+        Pixmap(struct Document *doc, int xref)
+        {
+            fz_image *img = NULL;
+            fz_pixmap *pix = NULL;
+            pdf_obj *ref = NULL;
+            pdf_obj *type;
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) doc);
+            fz_try(gctx) {
+                ASSERT_PDF(pdf);
+                int xreflen = pdf_xref_len(gctx, pdf);
+                if (!INRANGE(xref, 1, xreflen-1))
+                    THROWMSG("xref out of range");
+                ref = pdf_new_indirect(gctx, pdf, xref, 0);
+                type = pdf_dict_get(gctx, ref, PDF_NAME(Subtype));
+                if (!pdf_name_eq(gctx, type, PDF_NAME(Image)))
+                    THROWMSG("xref not an image");
+                img = pdf_load_image(gctx, pdf, ref);
+                pix = fz_get_pixmap_from_image(gctx, img, NULL, NULL, NULL, NULL);
+            }
+            fz_always(gctx) {
+                fz_drop_image(gctx, img);
+                pdf_drop_obj(gctx, ref);
+            }
+            fz_catch(gctx) {
+                fz_drop_pixmap(gctx, pix);
+                return NULL;
+            }
+            return (struct Pixmap *) pix;
+        }
+
+
+        //---------------------------------------------------------------------
+        // shrink
+        //---------------------------------------------------------------------
+        %pythonprepend shrink
+%{"""Divide width and height by 2**factor.
+E.g. factor=1 shrinks to 25% of original size (in place)."""%}
+        void shrink(int factor)
+        {
+            if (factor < 1)
+            {
+                JM_Warning("ignoring shrink factor < 1");
+                return;
+            }
+            fz_subsample_pixmap(gctx, (fz_pixmap *) $self, factor);
+        }
+
+        //---------------------------------------------------------------------
+        // apply gamma correction
+        //---------------------------------------------------------------------
+        %pythonprepend gammaWith
+%{"""Apply correction with some float.
+gamma=1 is a no-op."""}
+        void gammaWith(float gamma)
+        {
+            if (!fz_pixmap_colorspace(gctx, (fz_pixmap *) $self))
+            {
+                JM_Warning("colorspace invalid for function");
+                return;
+            }
+            fz_gamma_pixmap(gctx, (fz_pixmap *) $self, gamma);
+        }
+
+        //---------------------------------------------------------------------
+        // tint pixmap with color
+        //---------------------------------------------------------------------
+        %pythonprepend tintWith
+%{"""Tint colors with modifiers for black and white."""
+
+if not self.colorspace or self.colorspace.n > 3:
+    print("warning: colorspace invalid for function")
+    return%}
+        void tintWith(int black, int white)
+        {
+            fz_tint_pixmap(gctx, (fz_pixmap *) $self, black, white);
+        }
+
+        //----------------------------------------------------------------------
+        // clear all of pixmap samples to 0x00 */
+        //----------------------------------------------------------------------
+        %pythonprepend clearWith
+        %{"""Fill all color components with same value."""%}
+        void clearWith()
+        {
+            fz_clear_pixmap(gctx, (fz_pixmap *) $self);
+        }
+
+        //----------------------------------------------------------------------
+        // clear total pixmap with value */
+        //----------------------------------------------------------------------
+        void clearWith(int value)
+        {
+            fz_clear_pixmap_with_value(gctx, (fz_pixmap *) $self, value);
+        }
+
+        //----------------------------------------------------------------------
+        // clear pixmap rectangle with value
+        //----------------------------------------------------------------------
+        void clearWith(int value, PyObject *bbox)
+        {
+            JM_clear_pixmap_rect_with_value(gctx, (fz_pixmap *) $self, value, JM_irect_from_py(bbox));
+        }
+
+        //----------------------------------------------------------------------
+        // copy pixmaps
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(copyPixmap, !result)
+        %pythonprepend copyPixmap %{"""Copy bbox from another Pixmap."""%}
+        PyObject *copyPixmap(struct Pixmap *src, PyObject *bbox)
+        {
+            fz_try(gctx) {
+                fz_pixmap *pm = (fz_pixmap *) $self, *src_pix = (fz_pixmap *) src;
+                if (!fz_pixmap_colorspace(gctx, src_pix))
+                    THROWMSG("cannot copy pixmap with NULL colorspace");
+                if (pm->alpha != src_pix->alpha)
+                    THROWMSG("source and target alpha must be equal");
+                fz_copy_pixmap_rect(gctx, pm, src_pix, JM_irect_from_py(bbox), NULL);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //----------------------------------------------------------------------
+        // set alpha values
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(setAlpha, !result)
+        %pythonprepend setAlpha
+%{"""Set alphas to values contained in a byte array.
+If omitted, set alphas to 255."""%}
+        PyObject *setAlpha(PyObject *alphavalues=NULL)
+        {
+            fz_buffer *res = NULL;
+            fz_pixmap *pix = (fz_pixmap *) $self;
+            fz_try(gctx) {
+                if (pix->alpha == 0) THROWMSG("pixmap has no alpha");
+                int n = fz_pixmap_colorants(gctx, pix);
+                int w = fz_pixmap_width(gctx, pix);
+                int h = fz_pixmap_height(gctx, pix);
+                int balen = w * h * (n+1);
+                unsigned char *data = NULL;
+                int data_len = 0;
+                if (alphavalues) {
+                    res = JM_BufferFromBytes(gctx, alphavalues);
+                    if (res) {
+                        data_len = (int) fz_buffer_storage(gctx, res, &data);
+                        if (data && data_len < w * h)
+                            THROWMSG("not enough alpha values");
+                    }
+                    else THROWMSG("bad type: 'alphavalues'");
+                }
+                int i = 0, k = 0;
+                while (i < balen) {
+                    if (data_len) pix->samples[i+n] = data[k];
+                    else          pix->samples[i+n] = 255;
+                    i += n+1;
+                    k += 1;
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //----------------------------------------------------------------------
+        // Pixmap._getImageData
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(_getImageData, !result)
+        PyObject *_getImageData(int format)
+        {
+            fz_output *out = NULL;
+            fz_buffer *res = NULL;
+            PyObject *barray = NULL;
+            fz_pixmap *pm = (fz_pixmap *) $self;
+            fz_try(gctx) {
+                size_t size = fz_pixmap_stride(gctx, pm) * pm->h;
+                res = fz_new_buffer(gctx, size);
+                out = fz_new_output_with_buffer(gctx, res);
+
+                switch(format) {
+                    case(1):
+                        fz_write_pixmap_as_png(gctx, out, pm);
+                        break;
+                    case(2):
+                        fz_write_pixmap_as_pnm(gctx, out, pm);
+                        break;
+                    case(3):
+                        fz_write_pixmap_as_pam(gctx, out, pm);
+                        break;
+                    case(5):           // Adobe Photoshop Document
+                        fz_write_pixmap_as_psd(gctx, out, pm);
+                        break;
+                    case(6):           // Postscript format
+                        fz_write_pixmap_as_ps(gctx, out, pm);
+                        break;
+                    default:
+                        fz_write_pixmap_as_png(gctx, out, pm);
+                        break;
+                }
+                barray = JM_BinFromBuffer(gctx, res);
+            }
+            fz_always(gctx) {
+                fz_drop_output(gctx, out);
+                fz_drop_buffer(gctx, res);
+            }
+
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return barray;
+        }
+
+        %pythoncode %{
+def getImageData(self, output="png"):
+    """Convert to binary image stream of desired type.
+
+    Can be used as input to GUI packages like tkinter.
+
+    Args:
+        output: (str) image type, default is PNG. Others are PNM, PGM, PPM,
+                PBM, PAM, PSD, PS.
+    Returns:
+        Bytes object.
+    """
+    valid_formats = {"png": 1, "pnm": 2, "pgm": 2, "ppm": 2, "pbm": 2,
+                     "pam": 3, "tga": 4, "tpic": 4,
+                     "psd": 5, "ps": 6}
+    idx = valid_formats.get(output.lower(), 1)
+    if self.alpha and idx in (2, 6):
+        raise ValueError("'%s' cannot have alpha" % output)
+    if self.colorspace and self.colorspace.n > 3 and idx in (1, 2, 4):
+        raise ValueError("unsupported colorspace for '%s'" % output)
+    barray = self._getImageData(idx)
+    return barray
+
+def getPNGdata(self):
+    """Wrapper for Pixmap.getImageData("png")."""
+    barray = self._getImageData(1)
+    return barray
+
+def getPNGData(self):
+    """Wrapper for Pixmap.getImageData("png")."""
+    barray = self._getImageData(1)
+    return barray
+    %}
+
+        //----------------------------------------------------------------------
+        // _writeIMG
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(_writeIMG, !result)
+        PyObject *_writeIMG(char *filename, int format)
+        {
+            fz_try(gctx) {
+                fz_pixmap *pm = (fz_pixmap *) $self;
+                switch(format) {
+                    case(1):
+                        fz_save_pixmap_as_png(gctx, pm, filename);
+                        break;
+                    case(2):
+                        fz_save_pixmap_as_pnm(gctx, pm, filename);
+                        break;
+                    case(3):
+                        fz_save_pixmap_as_pam(gctx, pm, filename);
+                        break;
+                    case(5): // Adobe Photoshop Document
+                        fz_save_pixmap_as_psd(gctx, pm, filename);
+                        break;
+                    case(6): // Postscript
+                        fz_save_pixmap_as_ps(gctx, pm, filename, 0);
+                        break;
+                    default:
+                        fz_save_pixmap_as_png(gctx, pm, filename);
+                        break;
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+        %pythoncode %{
+def writeImage(self, filename, output=None):
+    """Output as image in format determined by filename extension.
+
+    Args:
+        output: (str) only use to override filename extension. Default is PNG.
+                Others are PNM, PGM, PPM, PBM, PAM, PSD, PS.
+    Returns:
+        Bytes object.
+    """
+    valid_formats = {"png": 1, "pnm": 2, "pgm": 2, "ppm": 2, "pbm": 2,
+                     "pam": 3, "tga": 4, "tpic": 4,
+                     "psd": 5, "ps": 6}
+    if output is None:
+        _, ext = os.path.splitext(filename)
+        output = ext[1:]
+
+    idx = valid_formats.get(output.lower(), 1)
+
+    if self.alpha and idx in (2, 6):
+        raise ValueError("'%s' cannot have alpha" % output)
+    if self.colorspace and self.colorspace.n > 3 and idx in (1, 2, 4):
+        raise ValueError("unsupported colorspace for '%s'" % output)
+
+    return self._writeIMG(filename, idx)
+
+def writePNG(self, filename):
+    """Wrapper for Pixmap.writeImage(filename, "png")."""
+    return self._writeIMG(filename, 1)
+
+
+def pillowWrite(self, *args, **kwargs):
+    """Write to image file using Pillow.
+
+    Arguments are passed to Pillow's Image.save() method.
+    Use instead of writeImage when other output formats are desired.
+    """
+    try:
+        from PIL import Image
+    except ImportError:
+        print("PIL/Pillow not instralled")
+        raise
+
+    cspace = self.colorspace
+    if cspace is None:
+        mode = "L"
+    elif cspace.n == 1:
+        mode = "L" if self.alpha == 0 else "LA"
+    elif cspace.n == 3:
+        mode = "RGB" if self.alpha == 0 else "RGBA"
+    else:
+        mode = "CMYK"
+
+    img = Image.frombytes(mode, (self.width, self.height), self.samples)
+
+    if "dpi" not in kwargs.keys():
+        kwargs["dpi"] = (self.xres, self.yres)
+
+    img.save(*args, **kwargs)
+
+def pillowData(self, *args, **kwargs):
+    """Convert to binary image stream using pillow.
+
+    Arguments are passed to Pillow's Image.save() method.
+    Use it instead of writeImage when other output formats are needed.
+    """
+    from io import BytesIO
+    bytes_out = BytesIO()
+    self.pillowSave(bytes_out, *args, **kwargs)
+    return bytes_out.get_value()
+
+        %}
+        //----------------------------------------------------------------------
+        // invertIRect
+        //----------------------------------------------------------------------
+        %pythonprepend invertIRect
+        %{"""Invert the colors inside a bbox."""%}
+        PyObject *invertIRect(PyObject *bbox = NULL)
+        {
+            fz_pixmap *pm = (fz_pixmap *) $self;
+            if (!fz_pixmap_colorspace(gctx, pm))
+                {
+                    JM_Warning("ignored for stencil pixmap");
+                    return JM_BOOL(0);
+                }
+
+            fz_irect r = JM_irect_from_py(bbox);
+            if (fz_is_infinite_irect(r))
+                r = fz_pixmap_bbox(gctx, pm);
+
+            return JM_BOOL(JM_invert_pixmap_rect(gctx, pm, r));
+        }
+
+        //----------------------------------------------------------------------
+        // get one pixel as a list
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(pixel, !result)
+        %pythonprepend pixel
+%{"""Get color tuple of pixel (x, y).
+Last item is the alpha if Pixmap.alpha is true."""%}
+        PyObject *pixel(int x, int y)
+        {
+            PyObject *p = NULL;
+            fz_try(gctx) {
+                fz_pixmap *pm = (fz_pixmap *) $self;
+                if (!INRANGE(x, 0, pm->w - 1) || !INRANGE(y, 0, pm->h - 1))
+                    THROWMSG("coordinates outside image");
+                int n = pm->n;
+                int stride = fz_pixmap_stride(gctx, pm);
+                int j, i = stride * y + n * x;
+                p = PyTuple_New(n);
+                for (j = 0; j < n; j++) {
+                    PyTuple_SET_ITEM(p, j, Py_BuildValue("i", pm->samples[i + j]));
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return p;
+        }
+
+        //----------------------------------------------------------------------
+        // Set one pixel to a given color tuple
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(setPixel, !result)
+        %pythonprepend setPixel
+        %{"""Set color of pixel (x, y)."""%}
+        PyObject *setPixel(int x, int y, PyObject *color)
+        {
+            fz_try(gctx) {
+                fz_pixmap *pm = (fz_pixmap *) $self;
+                if (!INRANGE(x, 0, pm->w - 1) || !INRANGE(y, 0, pm->h - 1))
+                    THROWMSG("outside image");
+                int n = pm->n;
+                if (!PySequence_Check(color) || PySequence_Size(color) != n)
+                    THROWMSG("bad color arg");
+                int i, j;
+                unsigned char c[5];
+                for (j = 0; j < n; j++) {
+                    i = (int) PyInt_AsLong(PySequence_ITEM(color, j));
+                    if (!INRANGE(i, 0, 255)) THROWMSG("bad pixel component");
+                    c[j] = (unsigned char) i;
+                }
+                int stride = fz_pixmap_stride(gctx, pm);
+                i = stride * y + n * x;
+                for (j = 0; j < n; j++) {
+                    pm->samples[i + j] = c[j];
+                }
+            }
+            fz_catch(gctx) {
+                PyErr_Clear();
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        //----------------------------------------------------------------------
+        // Set Pixmap resolution
+        //----------------------------------------------------------------------
+        %pythonprepend setResolution
+%{"""Set resolution in both dimensions.
+
+Use pillowWrite to reflect this in output image."""%}
+        PyObject *setResolution(int xres, int yres)
+        {
+            fz_pixmap *pm = (fz_pixmap *) $self;
+            pm->xres = xres;
+            pm->yres = yres;
+            Py_RETURN_NONE;
+        }
+
+        //----------------------------------------------------------------------
+        // Set a rect to a given color tuple
+        //----------------------------------------------------------------------
+        FITZEXCEPTION(setRect, !result)
+        %pythonprepend setRect
+        %{"""Set color of all pixels in bbox."""%}
+        PyObject *setRect(PyObject *bbox, PyObject *color)
+        {
+            PyObject *rc = NULL;
+            fz_try(gctx) {
+                fz_pixmap *pm = (fz_pixmap *) $self;
+                Py_ssize_t j, n = (Py_ssize_t) pm->n;
+                if (!PySequence_Check(color) || PySequence_Size(color) != n)
+                    THROWMSG("bad color arg");
+                unsigned char c[5];
+                int i;
+                for (j = 0; j < n; j++) {
+                    if (JM_INT_ITEM(color, j, &i) == 1)
+                        THROWMSG("bad color component");
+                    if (!INRANGE(i, 0, 255))
+                        THROWMSG("bad color component");
+                    c[j] = (unsigned char) i;
+                }
+                i = JM_fill_pixmap_rect_with_color(gctx, pm, c, JM_irect_from_py(bbox));
+                rc = JM_BOOL(i);
+            }
+            fz_catch(gctx) {
+                PyErr_Clear();
+                return NULL;
+            }
+            return rc;
+        }
+
+        //----------------------------------------------------------------------
+        // get length of one image row
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend stride %{"""Length of one image line (width * n)."""%}
+        int stride()
+        {
+            return fz_pixmap_stride(gctx, (fz_pixmap *) $self);
+        }
+
+        //----------------------------------------------------------------------
+        // x, y, width, height, xres, yres, n
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend xres %{"""Resolution in x direction."""%}
+        int xres()
+        {
+            fz_pixmap *this_pix = (fz_pixmap *) $self;
+            return this_pix->xres;
+        }
+
+        %pythoncode %{@property%}
+        %pythonprepend yres %{"""Resolution in y direction."""%}
+        int yres()
+        {
+            fz_pixmap *this_pix = (fz_pixmap *) $self;
+            return this_pix->yres;
+        }
+
+        %pythoncode %{@property%}
+        %pythonprepend w %{"""The width."""%}
+        int w()
+        {
+            return fz_pixmap_width(gctx, (fz_pixmap *) $self);
+        }
+
+        %pythoncode %{@property%}
+        %pythonprepend h %{"""The height."""%}
+        int h()
+        {
+            return fz_pixmap_height(gctx, (fz_pixmap *) $self);
+        }
+
+        %pythoncode %{@property%}
+        %pythonprepend x %{"""x component of Pixmap origin."""%}
+        int x()
+        {
+            return fz_pixmap_x(gctx, (fz_pixmap *) $self);
+        }
+
+        %pythoncode %{@property%}
+        %pythonprepend y %{"""y component of Pixmap origin."""%}
+        int y()
+        {
+            return fz_pixmap_y(gctx, (fz_pixmap *) $self);
+        }
+
+        %pythoncode %{@property%}
+        %pythonprepend n %{"""The size of one pixel."""%}
+        int n()
+        {
+            return fz_pixmap_components(gctx, (fz_pixmap *) $self);
+        }
+
+        //----------------------------------------------------------------------
+        // check alpha channel
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend alpha %{"""Indicates presence of alpha channel."""%}
+        int alpha()
+        {
+            return fz_pixmap_alpha(gctx, (fz_pixmap *) $self);
+        }
+
+        //----------------------------------------------------------------------
+        // get colorspace of pixmap
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend colorspace %{"""Pixmap Colorspace."""%}
+        struct Colorspace *colorspace()
+        {
+            return (struct Colorspace *) fz_pixmap_colorspace(gctx, (fz_pixmap *) $self);
+        }
+
+        //----------------------------------------------------------------------
+        // return irect of pixmap
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend irect %{"""Pixmap bbox - an IRect object."""%}
+        %pythonappend irect %{val = IRect(val)%}
+        PyObject *irect()
+        {
+            return JM_py_from_irect(fz_pixmap_bbox(gctx, (fz_pixmap *) $self));
+        }
+
+        //----------------------------------------------------------------------
+        // return size of pixmap
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend size %{"""Pixmap size."""%}
+        int size()
+        {
+            return (int) fz_pixmap_size(gctx, (fz_pixmap *) $self);
+        }
+
+        //----------------------------------------------------------------------
+        // samples
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend samples %{"""The area of all pixels."""%}
+        PyObject *samples()
+        {
+            fz_pixmap *pm = (fz_pixmap *) $self;
+            return PyBytes_FromStringAndSize((const char *) pm->samples, (Py_ssize_t) (pm->w)*(pm->h)*(pm->n));
+        }
+
+        %pythoncode %{
+        width  = w
+        height = h
+
+        def __len__(self):
+            return self.size
+
+        def __repr__(self):
+            if not type(self) is Pixmap: return
+            if self.colorspace:
+                return "Pixmap(%s, %s, %s)" % (self.colorspace.name, self.irect, self.alpha)
+            else:
+                return "Pixmap(%s, %s, %s)" % ('None', self.irect, self.alpha)
+
+        def __del__(self):
+            if not type(self) is Pixmap: return
+            self.__swig_destroy__(self)
+        %}
+    }
+};
+
+/* fz_colorspace */
+struct Colorspace
+{
+    %extend {
+        ~Colorspace()
+        {
+            DEBUGMSG1("Colorspace");
+            fz_colorspace *this_cs = (fz_colorspace *) $self;
+            fz_drop_colorspace(gctx, this_cs);
+            DEBUGMSG2;
+        }
+
+        %pythonprepend Colorspace
+        %{"""Supported are GRAY, RGB and CMYK."""%}
+        Colorspace(int type)
+        {
+            fz_colorspace *cs = NULL;
+            switch(type) {
+                case CS_GRAY:
+                    cs = fz_device_gray(gctx);
+                    break;
+                case CS_CMYK:
+                    cs = fz_device_cmyk(gctx);
+                    break;
+                case CS_RGB:
+                default:
+                    cs = fz_device_rgb(gctx);
+                    break;
+            }
+            return (struct Colorspace *) cs;
+        }
+        //----------------------------------------------------------------------
+        // number of bytes to define color of one pixel
+        //----------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend n %{"""Size of one pixel."""%}
+        PyObject *n()
+        {
+            return Py_BuildValue("i", fz_colorspace_n(gctx, (fz_colorspace *) $self));
+        }
+
+        //----------------------------------------------------------------------
+        // name of colorspace
+        //----------------------------------------------------------------------
+        PyObject *_name()
+        {
+            return JM_UnicodeFromStr(fz_colorspace_name(gctx, (fz_colorspace *) $self));
+        }
+
+        %pythoncode %{
+        @property
+        def name(self):
+            """Name of the Colorspace."""
+
+            if self.n == 1:
+                return csGRAY._name()
+            elif self.n == 3:
+                return csRGB._name()
+            elif self.n == 4:
+                return csCMYK._name()
+            return self._name()
+
+        def __repr__(self):
+            x = ("", "GRAY", "", "RGB", "CMYK")[self.n]
+            return "Colorspace(CS_%s) - %s" % (x, self.name)
+        %}
+    }
+};
+
+
+/* fz_device wrapper */
+%rename(Device) DeviceWrapper;
+struct DeviceWrapper
+{
+    %extend {
+        FITZEXCEPTION(DeviceWrapper, !result)
+        DeviceWrapper(struct Pixmap *pm, PyObject *clip) {
+            struct DeviceWrapper *dw = NULL;
+            fz_try(gctx) {
+                dw = (struct DeviceWrapper *)calloc(1, sizeof(struct DeviceWrapper));
+                fz_irect bbox = JM_irect_from_py(clip);
+                if (fz_is_infinite_irect(bbox))
+                    dw->device = fz_new_draw_device(gctx, fz_identity, (fz_pixmap *) pm);
+                else
+                    dw->device = fz_new_draw_device_with_bbox(gctx, fz_identity, (fz_pixmap *) pm, &bbox);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return dw;
+        }
+        DeviceWrapper(struct DisplayList *dl) {
+            struct DeviceWrapper *dw = NULL;
+            fz_try(gctx) {
+                dw = (struct DeviceWrapper *)calloc(1, sizeof(struct DeviceWrapper));
+                dw->device = fz_new_list_device(gctx, (fz_display_list *) dl);
+                dw->list = (fz_display_list *) dl;
+                fz_keep_display_list(gctx, (fz_display_list *) dl);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return dw;
+        }
+        DeviceWrapper(struct TextPage *tp, int flags = 0) {
+            struct DeviceWrapper *dw = NULL;
+            fz_try(gctx) {
+                dw = (struct DeviceWrapper *)calloc(1, sizeof(struct DeviceWrapper));
+                fz_stext_options opts = { 0 };
+                opts.flags = flags;
+                dw->device = fz_new_stext_device(gctx, (fz_stext_page *) tp, &opts);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return dw;
+        }
+        ~DeviceWrapper() {
+            fz_display_list *list = $self->list;
+            DEBUGMSG1("Device");
+            fz_close_device(gctx, $self->device);
+            fz_drop_device(gctx, $self->device);
+            DEBUGMSG2;
+            if(list)
+            {
+                DEBUGMSG1("DisplayList after Device");
+                fz_drop_display_list(gctx, list);
+                DEBUGMSG2;
+            }
+        }
+    }
+};
+
+//-----------------------------------------------------------------------------
+// fz_outline
+//-----------------------------------------------------------------------------
+%nodefaultctor;
+struct Outline {
+    %immutable;
+/*
+    fz_outline doesn't keep a ref number in mupdf's code,
+    which means that if the root outline node is dropped,
+    all the outline nodes will also be destroyed.
+
+    As a result, if the root Outline python object drops ref,
+    then other Outline will point to already freed area. E.g.:
+    >>> import fitz
+    >>> doc=fitz.Document('3.pdf')
+    >>> ol=doc.loadOutline()
+    >>> oln=ol.next
+    >>> oln.dest.page
+    5
+    >>> #drops root outline
+    ...
+    >>> ol=4
+    free outline
+    >>> oln.dest.page
+    0
+
+    I do not like to change struct of fz_document, so I decide
+    to delegate the outline destruction work to fz_document. That is,
+    when the Document is created, its outline is loaded in advance.
+    The outline will only be freed when the doc is destroyed, which means
+    in the python code, we must keep ref to doc if we still want to use outline
+    This is a nasty way but it requires little change to the mupdf code.
+    */
+/*
+    %extend {
+        ~Outline()
+        {
+            DEBUGMSG1("Outline");
+            fz_outline *this_ol = (fz_outline *) $self;
+            fz_drop_outline(gctx, this_ol);
+            DEBUGMSG2;
+        }
+    }
+*/
+    %extend {
+        %pythoncode %{@property%}
+        PyObject *uri()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            return JM_UnicodeFromStr(ol->uri);
+        }
+
+        %pythoncode %{@property%}
+        struct Outline *next()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            fz_outline *next_ol = ol->next;
+            if (!next_ol) return NULL;
+            next_ol = fz_keep_outline(gctx, next_ol);
+            return (struct Outline *) next_ol;
+        }
+
+        %pythoncode %{@property%}
+        struct Outline *down()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            fz_outline *down_ol = ol->down;
+            if (!down_ol) return NULL;
+            down_ol = fz_keep_outline(gctx, down_ol);
+            return (struct Outline *) down_ol;
+        }
+
+        %pythoncode %{@property%}
+        PyObject *isExternal()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            if (!ol->uri) Py_RETURN_FALSE;
+            return JM_BOOL(fz_is_external_link(gctx, ol->uri));
+        }
+
+        %pythoncode %{@property%}
+        int page()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            return ol->page;
+        }
+
+        %pythoncode %{@property%}
+        float x()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            return ol->x;
+        }
+
+        %pythoncode %{@property%}
+        float y()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            return ol->y;
+        }
+
+        %pythoncode %{@property%}
+        PyObject *title()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            return JM_UnicodeFromStr(ol->title);
+        }
+
+        %pythoncode %{@property%}
+        PyObject *is_open()
+        {
+            fz_outline *ol = (fz_outline *) $self;
+            return JM_BOOL(ol->is_open);
+        }
+
+        %pythoncode %{isOpen = is_open%}
+        %pythoncode %{
+        @property
+        def dest(self):
+            '''outline destination details'''
+            return linkDest(self, None)
+        %}
+    }
+};
+%clearnodefaultctor;
+
+
+//-----------------------------------------------------------------------------
+// Annotation
+//-----------------------------------------------------------------------------
+%nodefaultctor;
+struct Annot
+{
+    %extend
+    {
+        ~Annot()
+        {
+            DEBUGMSG1("Annot");
+            pdf_drop_annot(gctx, (pdf_annot *) $self);
+            DEBUGMSG2;
+        }
+        //---------------------------------------------------------------------
+        // annotation rectangle
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(rect, """Annotation rectangle.""")
+        %pythonappend rect %{
+        val = Rect(val)
+        val *= self.parent.derotationMatrix
+        %}
+        PyObject *rect()
+        {
+            fz_rect r = pdf_bound_annot(gctx, (pdf_annot *) $self);
+            return JM_py_from_rect(r);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation get xref number
+        //---------------------------------------------------------------------
+        PARENTCHECK(xref, """Annotation xref.""")
+        %pythoncode %{@property%}
+        PyObject *xref()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            return Py_BuildValue("i", pdf_to_num(gctx, annot->obj));
+        }
+
+        //---------------------------------------------------------------------
+        // annotation get AP/N Matrix
+        //---------------------------------------------------------------------
+        PARENTCHECK(APNMatrix, """Annotation appearance matrix.""")
+        %pythonappend APNMatrix %{val = Matrix(val)%}
+        %pythoncode %{@property%}
+        PyObject *
+        APNMatrix()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                            PDF_NAME(N), NULL);
+            if (!ap)
+                return JM_py_from_matrix(fz_identity);
+            fz_matrix mat = pdf_dict_get_matrix(gctx, ap, PDF_NAME(Matrix));
+            return JM_py_from_matrix(mat);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation get AP/N BBox
+        //---------------------------------------------------------------------
+        PARENTCHECK(APNBBox, """Annotation appearance bbox.""")
+        %pythonappend APNBBox %{
+        val = Rect(val) * self.parent.transformationMatrix
+        val *= self.parent.derotationMatrix%}
+        %pythoncode %{@property%}
+        PyObject *
+        APNBBox()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                            PDF_NAME(N), NULL);
+            if (!ap)
+                return JM_py_from_rect(fz_infinite_rect);
+            fz_rect rect = pdf_dict_get_rect(gctx, ap, PDF_NAME(BBox));
+            return JM_py_from_rect(rect);
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation set AP/N Matrix
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setAPNMatrix, !result)
+        PARENTCHECK(setAPNMatrix, """Set annotation appearance matrix.""")
+        PyObject *
+        setAPNMatrix(PyObject *matrix)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            fz_try(gctx) {
+                pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                                                PDF_NAME(N), NULL);
+                if (!ap) THROWMSG("annot has no appearance stream");
+                fz_matrix mat = JM_matrix_from_py(matrix);
+                pdf_dict_put_matrix(gctx, ap, PDF_NAME(Matrix), mat);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation set AP/N BBox
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setAPNBBox, !result)
+        %pythonprepend setAPNBBox %{
+        """Set annotation appearance bbox."""
+
+        CheckParent(self)
+        page = self.parent
+        rot = page.rotationMatrix
+        mat = page.transformationMatrix
+        bbox *= rot * ~mat
+        %}
+        PyObject *
+        setAPNBBox(PyObject *bbox)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            fz_try(gctx) {
+                pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                                                PDF_NAME(N), NULL);
+                if (!ap) THROWMSG("annot has no appearance stream");
+                fz_rect rect = JM_rect_from_py(bbox);
+                pdf_dict_put_rect(gctx, ap, PDF_NAME(BBox), rect);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation show blend mode (/BM)
+        //---------------------------------------------------------------------
+        PARENTCHECK(blendMode, """Annotation BlendMode.""")
+        PyObject *blendMode()
+        {
+            PyObject *blend_mode = NULL;
+            fz_try(gctx) {
+                pdf_annot *annot = (pdf_annot *) $self;
+                pdf_obj *obj, *obj1, *obj2;
+                obj = pdf_dict_get(gctx, annot->obj, PDF_NAME(BM));
+                if (obj) {  // check the annot object for /BM
+                    blend_mode = JM_UnicodeFromStr(pdf_to_name(gctx, obj));
+                    goto fertig;
+                }
+                // loop through the /AP/N/Resources/ExtGState objects
+                obj = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                    PDF_NAME(N),
+                    PDF_NAME(Resources),
+                    PDF_NAME(ExtGState),
+                    NULL);
+
+                if (pdf_is_dict(gctx, obj)) {
+                    int i, j, m, n = pdf_dict_len(gctx, obj);
+                    for (i = 0; i < n; i++) {
+                        obj1 = pdf_dict_get_val(gctx, obj, i);
+                        if (pdf_is_dict(gctx, obj1)) {
+                            m = pdf_dict_len(gctx, obj1);
+                            for (j = 0; j < m; j++) {
+                                obj2 = pdf_dict_get_key(gctx, obj1, j);
+                                if (pdf_objcmp(gctx, obj2, PDF_NAME(BM)) == 0) {
+                                    blend_mode = JM_UnicodeFromStr(pdf_to_name(gctx, pdf_dict_get_val(gctx, obj1, j)));
+                                    goto fertig;
+                                }
+                            }
+                        }
+                    }
+                }
+                fertig:;
+            }
+            fz_catch(gctx) {
+                return_none;
+            }
+            if (blend_mode) return blend_mode;
+            return_none;
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation set blend mode (/BM)
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setBlendMode, !result)
+        PARENTCHECK(setBlendMode, """Set annotation BlendMode.""")
+        PyObject *setBlendMode(char *blend_mode)
+        {
+            fz_try(gctx) {
+                pdf_annot *annot = (pdf_annot *) $self;
+                pdf_dict_put_name(gctx, annot->obj, PDF_NAME(BM), blend_mode);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        %pythoncode%{@property%}
+        %pythonprepend language %{"""Annotation language."""%}
+        PyObject *language()
+        {
+            pdf_annot *this_annot = (pdf_annot *) $self;
+            fz_text_language lang = pdf_annot_language(gctx, this_annot);
+            char buf[8];
+            if (lang == FZ_LANG_UNSET) return_none;
+            return Py_BuildValue("s", fz_string_from_text_language(buf, lang));
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set language (/Lang)
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setLaguage, !result)
+        PARENTCHECK(setLaguage, """Set annotation language.""")
+        PyObject *setLaguage(char *language=NULL)
+        {
+            pdf_annot *this_annot = (pdf_annot *) $self;
+            fz_try(gctx) {
+                fz_text_language lang;
+                if (!language)
+                    lang = FZ_LANG_UNSET;
+                else
+                    lang = fz_text_language_from_string(language);
+                pdf_set_annot_language(gctx, this_annot, lang);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            Py_RETURN_TRUE;
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation get decompressed appearance stream source
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getAP, !result)
+        PyObject *_getAP()
+        {
+            PyObject *r = Py_None;
+            fz_buffer *res = NULL;
+            fz_try(gctx) {
+                pdf_annot *annot = (pdf_annot *) $self;
+                pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                                              PDF_NAME(N), NULL);
+
+                if (pdf_is_stream(gctx, ap))  res = pdf_load_stream(gctx, ap);
+                if (res) r = JM_BinFromBuffer(gctx, res);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return_none;
+            }
+            return r;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation update /AP stream
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_setAP, !result)
+        PyObject *_setAP(PyObject *ap, int rect = 0)
+        {
+            fz_buffer *res = NULL;
+            fz_var(res);
+            fz_try(gctx) {
+                pdf_annot *annot = (pdf_annot *) $self;
+                pdf_obj *apobj = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                                              PDF_NAME(N), NULL);
+                if (!apobj) THROWMSG("annot has no /AP/N object");
+                if (!pdf_is_stream(gctx, apobj))
+                    THROWMSG("/AP/N object is no stream");
+                res = JM_BufferFromBytes(gctx, ap);
+                if (!res) THROWMSG("invalid /AP stream argument");
+                JM_update_stream(gctx, annot->page->doc, apobj, res, 1);
+                if (rect) {
+                    fz_rect bbox = pdf_dict_get_rect(gctx, annot->obj, PDF_NAME(Rect));
+                    pdf_dict_put_rect(gctx, apobj, PDF_NAME(BBox), bbox);
+                    annot->ap = NULL;
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        //---------------------------------------------------------------------
+        // redaction annotation get values
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_get_redact_values, !result)
+        %pythonappend _get_redact_values %{
+        if not val:
+            return val
+        val["rect"] = self.rect
+        text_color, fontname, fontsize = TOOLS._parse_da(self)
+        val["text_color"] = text_color
+        val["fontname"] = fontname
+        val["fontsize"] = fontsize
+        fill = self.colors["fill"]
+        val["fill"] = fill
+
+        %}
+        PyObject *
+        _get_redact_values()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            if (pdf_annot_type(gctx, annot) != PDF_ANNOT_REDACT)
+                return_none;
+
+            PyObject *values = PyDict_New();
+            const char *text = NULL;
+            fz_try(gctx) {
+                pdf_obj *obj = pdf_dict_gets(gctx, annot->obj, "RO");
+                if (obj) {
+                    THROWMSG("unsupported redaction key '/RO'.");
+                }
+                obj = pdf_dict_gets(gctx, annot->obj, "OverlayText");
+                if (obj) {
+                    text = pdf_to_text_string(gctx, obj);
+                    DICT_SETITEM_DROP(values, dictkey_text, JM_UnicodeFromStr(text));
+                } else {
+                    DICT_SETITEM_DROP(values, dictkey_text, Py_BuildValue("s", ""));
+                }
+                obj = pdf_dict_get(gctx, annot->obj, PDF_NAME(Q));
+                int align = 0;
+                if (obj) {
+                    align = pdf_to_int(gctx, obj);
+                }
+                DICT_SETITEM_DROP(values, dictkey_align, Py_BuildValue("i", align));
+            }
+            fz_catch(gctx) {
+                Py_DECREF(values);
+                return NULL;
+            }
+            return values;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set name
+        //---------------------------------------------------------------------
+        PARENTCHECK(setName, """Set /Name (icon) of annotation.""")
+        PyObject *setName(char *name)
+        {
+            fz_try(gctx) {
+                pdf_annot *annot = (pdf_annot *) $self;
+                pdf_dict_put_name(gctx, annot->obj, PDF_NAME(Name), name);
+                pdf_dirty_annot(gctx, annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set rectangle
+        //---------------------------------------------------------------------
+        PARENTCHECK(setRect, """Set annotation rectangle.""")
+        PyObject *setRect(PyObject *rect)
+        {
+            fz_try(gctx) {
+                pdf_annot *annot = (pdf_annot *) $self;
+                pdf_page *pdfpage = annot->page;
+                fz_matrix rot = JM_rotate_page_matrix(gctx, pdfpage);
+                fz_rect r = fz_transform_rect(JM_rect_from_py(rect), rot);
+                pdf_set_annot_rect(gctx, annot, r);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set rotation
+        //---------------------------------------------------------------------
+        PARENTCHECK(setRotation, """Set annotation rotation.""")
+        PyObject *setRotation(int rotate=0)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            int type = pdf_annot_type(gctx, annot);
+            switch (type)
+            {
+                case PDF_ANNOT_CARET: break;
+                case PDF_ANNOT_CIRCLE: break;
+                case PDF_ANNOT_FREE_TEXT: break;
+                case PDF_ANNOT_FILE_ATTACHMENT: break;
+                case PDF_ANNOT_INK: break;
+                case PDF_ANNOT_LINE: break;
+                case PDF_ANNOT_POLY_LINE: break;
+                case PDF_ANNOT_POLYGON: break;
+                case PDF_ANNOT_SQUARE: break;
+                case PDF_ANNOT_STAMP: break;
+                case PDF_ANNOT_TEXT: break;
+                default: return_none;
+            }
+            int rot = rotate;
+            while (rot < 0) rot += 360;
+            while (rot >= 360) rot -= 360;
+            if (type == PDF_ANNOT_FREE_TEXT && rot % 90 != 0)
+                rot = 0;
+
+            pdf_dict_put_int(gctx, annot->obj, PDF_NAME(Rotate), rot);
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation get rotation
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(rotation, """Annotation rotation.""")
+        int rotation()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            pdf_obj *rotation = pdf_dict_get(gctx, annot->obj, PDF_NAME(Rotate));
+            if (!rotation) return -1;
+            return pdf_to_int(gctx, rotation);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation vertices (for "Line", "Polgon", "Ink", etc.
+        //---------------------------------------------------------------------
+        PARENTCHECK(vertices, """Vertex points.""")
+        %pythoncode %{@property%}
+        PyObject *vertices()
+        {
+            PyObject *res = NULL, *res1 = NULL;
+            pdf_obj *o, *o1;
+            pdf_annot *annot = (pdf_annot *) $self;
+            int i, j;
+            fz_point point;  // point object to work with
+            fz_matrix page_ctm;  // page transformation matrix
+            pdf_page_transform(gctx, annot->page, NULL, &page_ctm);
+            fz_matrix derot = JM_derotate_page_matrix(gctx, annot->page);
+            page_ctm = fz_concat(page_ctm, derot);
+
+            //----------------------------------------------------------------
+            // The following objects occur in different annotation types.
+            // So we are sure that (!o) occurs at most once.
+            // Every pair of floats is one point, that needs to be separately
+            // transformed with the page transformation matrix.
+            //----------------------------------------------------------------
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(Vertices));
+            if (o) goto weiter;
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(L));
+            if (o) goto weiter;
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(QuadPoints));
+            if (o) goto weiter;
+            o = pdf_dict_gets(gctx, annot->obj, "CL");
+            if (o) goto weiter;
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(InkList));
+            if (o) goto inklist;
+            return_none;
+
+            // handle lists with 1-level depth --------------------------------
+            weiter:;
+            res = PyList_New(0);  // create Python list
+            for (i = 0; i < pdf_array_len(gctx, o); i += 2)
+            {
+                point.x = pdf_to_real(gctx, pdf_array_get(gctx, o, i));
+                point.y = pdf_to_real(gctx, pdf_array_get(gctx, o, i+1));
+                point = fz_transform_point(point, page_ctm);
+                LIST_APPEND_DROP(res, Py_BuildValue("ff", point.x, point.y));
+            }
+            return res;
+
+            // InkList has 2-level lists --------------------------------------
+            inklist:;
+            res = PyList_New(0);
+            for (i = 0; i < pdf_array_len(gctx, o); i++)
+            {
+                res1 = PyList_New(0);
+                o1 = pdf_array_get(gctx, o, i);
+                for (j = 0; j < pdf_array_len(gctx, o1); j += 2)
+                {
+                    point.x = pdf_to_real(gctx, pdf_array_get(gctx, o1, j));
+                    point.y = pdf_to_real(gctx, pdf_array_get(gctx, o1, j+1));
+                    point = fz_transform_point(point, page_ctm);
+                    LIST_APPEND_DROP(res1, Py_BuildValue("ff", point.x, point.y));
+                }
+                LIST_APPEND_DROP(res, res1);
+            }
+            return res;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation colors
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(colors, """Color definitions.""")
+        PyObject *colors()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            return JM_annot_colors(gctx, annot->obj);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation update appearance
+        //---------------------------------------------------------------------
+        PyObject *_update_appearance(float opacity=-1, char *blend_mode=NULL,
+            PyObject *fill_color=NULL,
+            int rotate = -1)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            int type = pdf_annot_type(gctx, annot);
+            float fcol[4] = {1,1,1,1};  // std fill color: white
+            int nfcol = 0;  // number of color components
+            JM_color_FromSequence(fill_color, &nfcol, fcol);
+            fz_try(gctx) {
+                pdf_dirty_annot(gctx, annot); // enforce MuPDF /AP formatting
+                if (type == PDF_ANNOT_FREE_TEXT && EXISTS(fill_color))
+                    pdf_set_annot_color(gctx, annot, nfcol, fcol);
+
+                int insert_rot = (rotate >= 0) ? 1 : 0;
+                switch (type) {
+                    case PDF_ANNOT_CARET:
+                    case PDF_ANNOT_CIRCLE:
+                    case PDF_ANNOT_FREE_TEXT:
+                    case PDF_ANNOT_FILE_ATTACHMENT:
+                    case PDF_ANNOT_INK:
+                    case PDF_ANNOT_LINE:
+                    case PDF_ANNOT_POLY_LINE:
+                    case PDF_ANNOT_POLYGON:
+                    case PDF_ANNOT_SQUARE:
+                    case PDF_ANNOT_STAMP:
+                    case PDF_ANNOT_TEXT: break;
+                    default: insert_rot = 0;
+                }
+
+                if (insert_rot)
+                    pdf_dict_put_int(gctx, annot->obj, PDF_NAME(Rotate), rotate);
+                annot->needs_new_ap = 1;  // re-create appearance stream
+                pdf_update_annot(gctx, annot);  // update the annotation
+
+            }
+            fz_catch(gctx) {
+                PySys_WriteStderr("cannot update annot: '%s'\n", fz_caught_message(gctx));
+                Py_RETURN_FALSE;
+            }
+
+            if ((opacity < 0 || opacity >= 1) && !blend_mode)  // no opacity, no blend_mode
+                Py_RETURN_TRUE;
+
+            fz_try(gctx) {  // create or update /ExtGState
+                pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP),
+                                        PDF_NAME(N), NULL);
+                if (!ap)  // should never happen
+                    THROWMSG("annot has no /AP object");
+
+                pdf_obj *resources = pdf_dict_get(gctx, ap, PDF_NAME(Resources));
+                if (!resources) {  // no Resources yet: make one
+                    resources = pdf_dict_put_dict(gctx, ap, PDF_NAME(Resources), 2);
+                }
+                pdf_obj *alp0 = pdf_new_dict(gctx, annot->page->doc, 3);
+                if (opacity >= 0 && opacity < 1) {
+                    pdf_dict_put_real(gctx, alp0, PDF_NAME(CA), (double) opacity);
+                    pdf_dict_put_real(gctx, alp0, PDF_NAME(ca), (double) opacity);
+                    pdf_dict_put_real(gctx, annot->obj, PDF_NAME(CA), (double) opacity);
+                }
+                if (blend_mode) {
+                    pdf_dict_put_name(gctx, alp0, PDF_NAME(BM), blend_mode);
+                    pdf_dict_put_name(gctx, annot->obj, PDF_NAME(BM), blend_mode);
+                }
+                pdf_obj *extg = pdf_dict_get(gctx, resources, PDF_NAME(ExtGState));
+                if (!extg) {  // no ExtGState yet: make one
+                    extg = pdf_dict_put_dict(gctx, resources, PDF_NAME(ExtGState), 2);
+                }
+                pdf_dict_put_drop(gctx, extg, PDF_NAME(H), alp0);
+            }
+
+            fz_catch(gctx) {
+                PySys_WriteStderr("could not set opacity or blend mode\n");
+                Py_RETURN_FALSE;
+            }
+            Py_RETURN_TRUE;
+        }
+
+
+        %pythoncode %{
+        def update(self,
+                   blend_mode=None,
+                   opacity=None,
+                   fontsize=0,
+                   fontname=None,
+                   text_color=None,
+                   border_color=None,
+                   fill_color=None,
+                   cross_out=True,
+                   rotate=-1,
+                   ):
+
+            """Update annot appearance.
+
+            Notes:
+                Depending on the annot type, some parameters make no sense,
+                while others are only available in this method to achieve the
+                desired result - especially for 'FreeText' annots.
+            Args:
+                blend_mode: set the blend mode, all annotations.
+                opacity: set the opacity, all annotations.
+                fontsize: set fontsize, 'FreeText' only.
+                fontname: set the font, 'FreeText' only.
+                border_color: set border color, 'FreeText' only.
+                text_color: set text color, 'FreeText' only.
+                fill_color: set fill color, all annotations.
+                cross_out: draw diagonal lines, 'Redact' only.
+                rotate: set rotation, 'FreeText' and some others.
+            """
+            CheckParent(self)
+            def color_string(cs, code):
+                """Return valid PDF color operator for a given color sequence.
+                """
+                if not cs:
+                    return b""
+                if hasattr(cs, "__float__") or len(cs) == 1:
+                    app = " g\n" if code == "f" else " G\n"
+                elif len(cs) == 3:
+                    app = " rg\n" if code == "f" else " RG\n"
+                elif len(cs) == 4:
+                    app = " k\n" if code == "f" else " K\n"
+                else:
+                    return b""
+
+                if hasattr(cs, "__len__"):
+                    col = " ".join(map(str, cs)) + app
+                else:
+                    col = "%g" % cs + app
+
+                return bytes(col, "utf8") if not fitz_py2 else col
+
+            type = self.type[0]  # get the annot type
+            dt = self.border["dashes"]  # get the dashes spec
+            bwidth = self.border["width"]  # get border line width
+            stroke = self.colors["stroke"]  # get the stroke color
+            if fill_color is not None and type in (PDF_ANNOT_FREE_TEXT, PDF_ANNOT_REDACT):
+                fill = fill_color
+            else:
+                fill = self.colors["fill"]
+                if not fill:
+                    fill = None
+
+            rect = None  # self.rect  # prevent MuPDF fiddling with it
+            apnmat = self.APNMatrix  # prevent MuPDF fiddling with it
+            if rotate != -1:  # sanitize rotation value
+                while rotate < 0:
+                    rotate += 360
+                while rotate >= 360:
+                    rotate -= 360
+                if type == PDF_ANNOT_FREE_TEXT and rotate % 90 != 0:
+                    rotate = 0
+
+            #------------------------------------------------------------------
+            # handle opacity and blend mode
+            #------------------------------------------------------------------
+            if blend_mode is None:
+                blend_mode = self.blendMode()
+            if not hasattr(opacity, "__float__"):
+                opacity = self.opacity
+
+            if 0 <= opacity < 1 or blend_mode is not None:
+                opa_code = "/H gs\n"  # then we must reference this 'gs'
+            else:
+                opa_code = ""
+
+            #------------------------------------------------------------------
+            # now invoke MuPDF to update the annot appearance
+            #------------------------------------------------------------------
+            val = self._update_appearance(
+                opacity=opacity,
+                blend_mode=blend_mode,
+                fill_color=fill,
+                rotate=rotate,
+            )
+            if not val:  # something went wrong, skip the rest
+                return val
+
+            bfill = color_string(fill, "f")
+            bstroke = color_string(stroke, "s")
+
+            p_ctm = self.parent.transformationMatrix
+            imat = ~p_ctm  # inverse page transf. matrix
+
+            if dt:
+                dashes = "[" + " ".join(map(str, dt)) + "] 0 d\n"
+                dashes = dashes.encode("utf-8")
+            else:
+                dashes = None
+
+            if self.lineEnds:
+                line_end_le, line_end_ri = self.lineEnds
+            else:
+                line_end_le, line_end_ri = 0, 0  # init line end codes
+
+            # read contents as created by MuPDF
+            ap = self._getAP()
+            ap_tab = ap.splitlines()  # split in single lines
+            ap_updated = False  # assume we did nothing
+
+            if type == PDF_ANNOT_REDACT:
+                if cross_out:  # create crossed-out rect
+                    ap_updated = True
+                    ap_tab = ap_tab[:-1]
+                    _, LL, LR, UR, UL = ap_tab
+                    ap_tab.append(LR)
+                    ap_tab.append(LL)
+                    ap_tab.append(UR)
+                    ap_tab.append(LL)
+                    ap_tab.append(UL)
+                    ap_tab.append(b"S")
+
+                if bwidth > 0 or bstroke != b"":
+                    ap_updated = True
+                    ntab = [b"%g w" % bwidth] if bwidth > 0 else []
+                    for line in ap_tab:
+                        if line.endswith(b"w"):
+                            continue
+                        if line.endswith(b"RG") and bstroke != b"":
+                            line = bstroke[:-1]
+                        ntab.append(line)
+                    ap_tab = ntab
+
+                ap = b"\n".join(ap_tab)
+
+            if type == PDF_ANNOT_FREE_TEXT:
+                CheckColor(border_color)
+                CheckColor(text_color)
+                tcol, fname, fsize = TOOLS._parse_da(self)
+
+                # read and update default appearance as necessary
+                update_default_appearance = False
+                if fsize <= 0:
+                    fsize = 12
+                    update_default_appearance = True
+                if text_color is not None:
+                    tcol = text_color
+                    update_default_appearance = True
+                if fontname is not None:
+                    fname = fontname
+                    update_default_appearance = True
+                if fontsize > 0:
+                    fsize = fontsize
+                    update_default_appearance = True
+
+                da_str = ""
+                if len(tcol) == 3:
+                    fmt = "{:g} {:g} {:g} rg /{f:s} {s:g} Tf"
+                elif len(tcol) == 1:
+                    fmt = "{:g} g /{f:s} {s:g} Tf"
+                elif len(tcol) == 4:
+                    fmt = "{:g} {:g} {:g} {:g} k /{f:s} {s:g} Tf"
+                da_str = fmt.format(*tcol, f=fname, s=fsize)
+                TOOLS._update_da(self, da_str)
+                
+                for i, item in enumerate(ap_tab):
+                    if (item.endswith(b" w")
+                        and bwidth > 0
+                        and border_color is not None
+                       ):  # update border color
+                        ap_tab[i + 1] = color_string(border_color, "s")
+                        continue
+                    if item == b"BT":  # update text color
+                        ap_tab[i + 1] = color_string(tcol, "f")
+                        continue
+
+                if dashes is not None:  # handle dashes
+                    ap_tab.insert(0, dashes)
+                    dashes = None
+
+                ap = b"\n".join(ap_tab)         # updated AP stream
+                ap_updated = True
+
+            if type in (PDF_ANNOT_POLYGON, PDF_ANNOT_POLY_LINE):
+                ap = b"\n".join(ap_tab[:-1]) + b"\n"
+                ap_updated = True
+                if bfill != b"":
+                    if type == PDF_ANNOT_POLYGON:
+                        ap = ap + bfill + b"b"  # close, fill, and stroke
+                    elif type == PDF_ANNOT_POLY_LINE:
+                        ap = ap + bfill + b"B"  # fill and stroke
+                else:
+                    if type == PDF_ANNOT_POLYGON:
+                        ap = ap + b"s"  # close and stroke
+                    elif type == PDF_ANNOT_POLY_LINE:
+                        ap = ap + b"S"  # stroke
+
+            if dashes is not None:  # handle dashes
+                ap = dashes + ap
+                # reset dashing - only applies for LINE annots with line ends given
+                ap = ap.replace(b"\nS\n", b"\nS\n[] 0 d\n", 1)
+                ap_updated = True
+
+            if opa_code:
+                ap = opa_code.encode("utf-8") + ap
+                ap_updated = True
+
+            ap = b"q\n" + ap + b"\nQ\n"
+            #----------------------------------------------------------------------
+            # the following handles line end symbols for 'Polygon' and 'Polyline'
+            #----------------------------------------------------------------------
+            if line_end_le + line_end_ri > 0 and type in (PDF_ANNOT_POLYGON, PDF_ANNOT_POLY_LINE):
+
+                le_funcs = (None, TOOLS._le_square, TOOLS._le_circle,
+                            TOOLS._le_diamond, TOOLS._le_openarrow,
+                            TOOLS._le_closedarrow, TOOLS._le_butt,
+                            TOOLS._le_ropenarrow, TOOLS._le_rclosedarrow,
+                            TOOLS._le_slash)
+                le_funcs_range = range(1, len(le_funcs))
+                d = 2 * max(1, self.border["width"])
+                rect = self.rect + (-d, -d, d, d)
+                ap_updated = True
+                points = self.vertices
+                if line_end_le in le_funcs_range:
+                    p1 = Point(points[0]) * imat
+                    p2 = Point(points[1]) * imat
+                    left = le_funcs[line_end_le](self, p1, p2, False, fill_color)
+                    ap += bytes(left, "utf8") if not fitz_py2 else left
+                if line_end_ri in le_funcs_range:
+                    p1 = Point(points[-2]) * imat
+                    p2 = Point(points[-1]) * imat
+                    left = le_funcs[line_end_ri](self, p1, p2, True, fill_color)
+                    ap += bytes(left, "utf8") if not fitz_py2 else left
+
+            if ap_updated:
+                if rect:                        # rect modified here?
+                    self.setRect(rect)
+                    self._setAP(ap, rect=1)
+                else:
+                    self._setAP(ap, rect=0)
+
+            #-------------------------------
+            # handle annotation rotations
+            #-------------------------------
+            if type not in (  # only these types are supported
+                PDF_ANNOT_CARET,
+                PDF_ANNOT_CIRCLE,
+                PDF_ANNOT_FILE_ATTACHMENT,
+                PDF_ANNOT_INK,
+                PDF_ANNOT_LINE,
+                PDF_ANNOT_POLY_LINE,
+                PDF_ANNOT_POLYGON,
+                PDF_ANNOT_SQUARE,
+                PDF_ANNOT_STAMP,
+                PDF_ANNOT_TEXT,
+                ):
+                return
+
+            rot = self.rotation  # get value from annot object
+            if rot == -1:  # nothing to change
+                return
+
+            M = (self.rect.tl + self.rect.br) / 2  # center of annot rect
+
+            if rot == 0:  # undo rotations
+                if abs(apnmat - Matrix(1, 1)) < 1e-5:
+                    return  # matrix already is a no-op
+                quad = self.rect.morph(M, ~apnmat)  # derotate rect
+                self.setRect(quad.rect)
+                self.setAPNMatrix(Matrix(1, 1))  # appearance matrix = no-op
+                return
+
+            mat = Matrix(rot)
+            quad = self.rect.morph(M, mat)
+            self.setRect(quad.rect)
+            self.setAPNMatrix(apnmat * mat)
+
+        %}
+
+        //---------------------------------------------------------------------
+        // annotation set colors
+        //---------------------------------------------------------------------
+        %pythonprepend setColors %{
+        """Set 'stroke' and 'fill' colors.
+
+        Use either a dict or the direct arguments.
+        """
+        CheckParent(self)
+        if type(colors) is not dict:
+            colors = {"fill": fill, "stroke": stroke}
+        %}
+        void setColors(PyObject *colors=NULL, PyObject *fill=NULL, PyObject *stroke=NULL)
+        {
+            if (!PyDict_Check(colors)) return;
+            pdf_annot *annot = (pdf_annot *) $self;
+            int type = pdf_annot_type(gctx, annot);
+            PyObject *ccol, *icol;
+            ccol = PyDict_GetItem(colors, dictkey_stroke);
+            icol = PyDict_GetItem(colors, dictkey_fill);
+            int i, n;
+            float col[4];
+            n = 0;
+            if (EXISTS(ccol)) {
+                JM_color_FromSequence(ccol, &n, col);
+                fz_try(gctx) {
+                    pdf_set_annot_color(gctx, annot, n, col);
+                }
+                fz_catch(gctx) {
+                    JM_Warning("could not set stroke color for this annot type");
+                }
+            }
+            n = 0;
+            if (EXISTS(icol)) {
+                JM_color_FromSequence(icol, &n, col);
+                if (type != PDF_ANNOT_REDACT) {
+                    fz_try(gctx)
+                        pdf_set_annot_interior_color(gctx, annot, n, col);
+                    fz_catch(gctx)
+                        JM_Warning("cannot set fill color for this annot type");
+                } else {
+                    pdf_obj *arr = pdf_new_array(gctx, annot->page->doc, n);
+                    for (i = 0; i < n; i++) {
+                        pdf_array_push_real(gctx, arr, col[i]);
+                    }
+                    pdf_dict_put_drop(gctx, annot->obj, PDF_NAME(IC), arr);
+                }
+            }
+            return;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation lineEnds
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(lineEnds, """Line end codes.""")
+        PyObject *lineEnds()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+
+            // return nothing for invalid annot types
+            if (!pdf_annot_has_line_ending_styles(gctx, annot))
+                return_none;
+
+            int lstart = (int) pdf_annot_line_start_style(gctx, annot);
+            int lend = (int) pdf_annot_line_end_style(gctx, annot);
+            return Py_BuildValue("ii", lstart, lend);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set line ends
+        //---------------------------------------------------------------------
+        PARENTCHECK(setLineEnds, """Set line end codes.""")
+        void setLineEnds(int start, int end)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            if (pdf_annot_has_line_ending_styles(gctx, annot))
+                pdf_set_annot_line_ending_styles(gctx, annot, start, end);
+            else
+                JM_Warning("annot type has no line ends");
+        }
+
+        //---------------------------------------------------------------------
+        // annotation type
+        //---------------------------------------------------------------------
+        PARENTCHECK(type, """Annotation type.""")
+        %pythoncode %{@property%}
+        PyObject *type()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            int type = pdf_annot_type(gctx, annot);
+            const char *c = pdf_string_from_annot_type(gctx, type);
+            pdf_obj *o = pdf_dict_gets(gctx, annot->obj, "IT");
+            if (!o || !pdf_is_name(gctx, o))
+                return Py_BuildValue("is", type, c);         // no IT entry
+            const char *it = pdf_to_name(gctx, o);
+            return Py_BuildValue("iss", type, c, it);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation opacity
+        //---------------------------------------------------------------------
+        PARENTCHECK(opacity, """Opacity.""")
+        %pythoncode %{@property%}
+        PyObject *opacity()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            double opy = -1;
+            pdf_obj *ca = pdf_dict_get(gctx, annot->obj, PDF_NAME(CA));
+            if (pdf_is_number(gctx, ca))
+                opy = pdf_to_real(gctx, ca);
+            return Py_BuildValue("f", opy);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set opacity
+        //---------------------------------------------------------------------
+        PARENTCHECK(setOpacity, """Set opacity.""")
+        void setOpacity(float opacity)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            if (!INRANGE(opacity, 0.0f, 1.0f))
+            {
+                pdf_set_annot_opacity(gctx, annot, 1);
+                return;
+            }
+            pdf_set_annot_opacity(gctx, annot, opacity);
+            if (opacity < 1.0f)
+            {
+                annot->page->transparency = 1;
+            }
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation get attached file info
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(fileInfo, !result)
+        PARENTCHECK(fileInfo, """Attached file information.""")
+        PyObject *fileInfo()
+        {
+            PyObject *res = PyDict_New();  // create Python dict
+            char *filename = NULL;
+            char *desc = NULL;
+            int length = -1, size = -1;
+            pdf_obj *stream = NULL, *o = NULL, *fs = NULL;
+            pdf_annot *annot = (pdf_annot *) $self;
+
+            fz_try(gctx) {
+                int type = (int) pdf_annot_type(gctx, annot);
+                if (type != PDF_ANNOT_FILE_ATTACHMENT)
+                    THROWMSG("not a file attachment annot");
+                stream = pdf_dict_getl(gctx, annot->obj, PDF_NAME(FS),
+                                   PDF_NAME(EF), PDF_NAME(F), NULL);
+                if (!stream) THROWMSG("bad PDF: file entry not found");
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+
+            fs = pdf_dict_get(gctx, annot->obj, PDF_NAME(FS));
+
+            o = pdf_dict_get(gctx, fs, PDF_NAME(UF));
+            if (o) {
+                filename = (char *) pdf_to_text_string(gctx, o);
+            } else {
+                o = pdf_dict_get(gctx, fs, PDF_NAME(F));
+                if (o) filename = (char *) pdf_to_text_string(gctx, o);
+            }
+
+            o = pdf_dict_get(gctx, fs, PDF_NAME(Desc));
+            if (o) desc = (char *) pdf_to_text_string(gctx, o);
+
+            o = pdf_dict_get(gctx, stream, PDF_NAME(Length));
+            if (o) length = pdf_to_int(gctx, o);
+
+            o = pdf_dict_getl(gctx, stream, PDF_NAME(Params),
+                                PDF_NAME(Size), NULL);
+            if (o) size = pdf_to_int(gctx, o);
+
+            DICT_SETITEM_DROP(res, dictkey_filename, JM_EscapeStrFromStr(filename));
+            DICT_SETITEM_DROP(res, dictkey_desc, JM_UnicodeFromStr(desc));
+            DICT_SETITEM_DROP(res, dictkey_length, Py_BuildValue("i", length));
+            DICT_SETITEM_DROP(res, dictkey_size, Py_BuildValue("i", size));
+            return res;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation get attached file content
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(fileGet, !result)
+        PARENTCHECK(fileGet, """Attached file content.""")
+        PyObject *fileGet()
+        {
+            PyObject *res = NULL;
+            pdf_obj *stream = NULL;
+            fz_buffer *buf = NULL;
+            pdf_annot *annot = (pdf_annot *) $self;
+            fz_var(buf);
+            fz_try(gctx) {
+                int type = (int) pdf_annot_type(gctx, annot);
+                if (type != PDF_ANNOT_FILE_ATTACHMENT)
+                    THROWMSG("not a file attachment annot");
+                stream = pdf_dict_getl(gctx, annot->obj, PDF_NAME(FS),
+                                   PDF_NAME(EF), PDF_NAME(F), NULL);
+                if (!stream) THROWMSG("bad PDF: file entry not found");
+                buf = pdf_load_stream(gctx, stream);
+                res = JM_BinFromBuffer(gctx, buf);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, buf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return res;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation update attached file
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(fileUpd, !result)
+        %pythonprepend fileUpd %{
+        """Update attached file."""
+        CheckParent(self)%}
+
+        PyObject *fileUpd(PyObject *buffer=NULL, char *filename=NULL, char *ufilename=NULL, char *desc=NULL)
+        {
+            pdf_document *pdf = NULL;       // to be filled in
+            char *data = NULL;              // for new file content
+            fz_buffer *res = NULL;          // for compressed content
+            pdf_obj *stream = NULL, *fs = NULL;
+            int64_t size = 0;
+            pdf_annot *annot = (pdf_annot *) $self;
+            fz_try(gctx) {
+                pdf = annot->page->doc;     // the owning PDF
+                int type = (int) pdf_annot_type(gctx, annot);
+                if (type != PDF_ANNOT_FILE_ATTACHMENT)
+                    THROWMSG("bad annot type");
+                stream = pdf_dict_getl(gctx, annot->obj, PDF_NAME(FS),
+                                   PDF_NAME(EF), PDF_NAME(F), NULL);
+                // the object for file content
+                if (!stream) THROWMSG("bad PDF: no /EF object");
+
+                fs = pdf_dict_get(gctx, annot->obj, PDF_NAME(FS));
+
+                // file content given
+                res = JM_BufferFromBytes(gctx, buffer);
+                if (buffer && !res) THROWMSG("bad type: 'buffer'");
+                if (res) {
+                    JM_update_stream(gctx, pdf, stream, res, 1);
+                    // adjust /DL and /Size parameters
+                    int64_t len = (int64_t) fz_buffer_storage(gctx, res, NULL);
+                    pdf_obj *l = pdf_new_int(gctx, len);
+                    pdf_dict_put(gctx, stream, PDF_NAME(DL), l);
+                    pdf_dict_putl(gctx, stream, l, PDF_NAME(Params), PDF_NAME(Size), NULL);
+                }
+
+                if (filename) {
+                    pdf_dict_put_text_string(gctx, stream, PDF_NAME(F), filename);
+                    pdf_dict_put_text_string(gctx, fs, PDF_NAME(F), filename);
+                    pdf_dict_put_text_string(gctx, stream, PDF_NAME(UF), filename);
+                    pdf_dict_put_text_string(gctx, fs, PDF_NAME(UF), filename);
+                    pdf_dict_put_text_string(gctx, annot->obj, PDF_NAME(Contents), filename);
+                }
+
+                if (ufilename) {
+                    pdf_dict_put_text_string(gctx, stream, PDF_NAME(UF), ufilename);
+                    pdf_dict_put_text_string(gctx, fs, PDF_NAME(UF), ufilename);
+                }
+
+                if (desc) {
+                    pdf_dict_put_text_string(gctx, stream, PDF_NAME(Desc), desc);
+                    pdf_dict_put_text_string(gctx, fs, PDF_NAME(Desc), desc);
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf->dirty = 1;
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation info
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(info, """Various information details.""")
+        PyObject *info()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            PyObject *res = PyDict_New();
+            pdf_obj *o;
+
+            DICT_SETITEM_DROP(res, dictkey_content,
+                          JM_UnicodeFromStr(pdf_annot_contents(gctx, annot)));
+
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(Name));
+            DICT_SETITEM_DROP(res, dictkey_name, JM_UnicodeFromStr(pdf_to_name(gctx, o)));
+
+            // Title (= author)
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(T));
+            DICT_SETITEM_DROP(res, dictkey_title, JM_UnicodeFromStr(pdf_to_text_string(gctx, o)));
+
+            // CreationDate
+            o = pdf_dict_gets(gctx, annot->obj, "CreationDate");
+            DICT_SETITEM_DROP(res, dictkey_creationDate,
+                          JM_UnicodeFromStr(pdf_to_text_string(gctx, o)));
+
+            // ModDate
+            o = pdf_dict_get(gctx, annot->obj, PDF_NAME(M));
+            DICT_SETITEM_DROP(res, dictkey_modDate, JM_UnicodeFromStr(pdf_to_text_string(gctx, o)));
+
+            // Subj
+            o = pdf_dict_gets(gctx, annot->obj, "Subj");
+            DICT_SETITEM_DROP(res, dictkey_subject,
+                          Py_BuildValue("s",pdf_to_text_string(gctx, o)));
+
+            // Identification (PDF key /NM)
+            o = pdf_dict_gets(gctx, annot->obj, "NM");
+            DICT_SETITEM_DROP(res, dictkey_id,
+                          JM_UnicodeFromStr(pdf_to_text_string(gctx, o)));
+
+            return res;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation set information
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(setInfo, !result)
+        %pythonprepend setInfo %{
+        """Set various properties."""
+        CheckParent(self)
+        if type(info) is dict:  # build the args from the dictionary
+            content = info.get("content", None)
+            title = info.get("title", None)
+            creationDate = info.get("creationDate", None)
+            modDate = info.get("modDate", None)
+            subject = info.get("subject", None)
+            info = None
+        %}
+        PyObject *setInfo(PyObject *info=NULL, char *content=NULL, char *title=NULL,
+                          char *creationDate=NULL, char *modDate=NULL, char *subject=NULL)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            // use this to indicate a 'markup' annot type
+            int is_markup = pdf_annot_has_author(gctx, annot);
+            fz_try(gctx) {
+                // contents
+                if (content)
+                    pdf_set_annot_contents(gctx, annot, content);
+
+                if (is_markup) {
+                    // title (= author)
+                    if (title)
+                        pdf_set_annot_author(gctx, annot, title);
+
+                    // creation date
+                    if (creationDate)
+                        pdf_dict_put_text_string(gctx, annot->obj,
+                                                 PDF_NAME(CreationDate), creationDate);
+
+                    // mod date
+                    if (modDate)
+                        pdf_dict_put_text_string(gctx, annot->obj,
+                                                 PDF_NAME(M), modDate);
+
+                    // subject
+                    if (subject)
+                        pdf_dict_puts_drop(gctx, annot->obj, "Subj",
+                                           pdf_new_text_string(gctx, subject));
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // annotation border
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(border, """Border information.""")
+        PyObject *border()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            return JM_annot_border(gctx, annot->obj);
+        }
+
+        //---------------------------------------------------------------------
+        // set annotation border
+        //---------------------------------------------------------------------
+        %pythonprepend setBorder %{
+        """Set border properties.
+
+        Either a dict, or direct arguments width, style and dashes."""
+        CheckParent(self)
+        if type(border) is not dict:
+            border = {"width": width, "style": style, "dashes": dashes}
+        %}
+        PyObject *setBorder(PyObject *border=NULL, float width=0, char *style=NULL, PyObject *dashes=NULL)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            return JM_annot_set_border(gctx, border, annot->page->doc, annot->obj);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation flags
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        PARENTCHECK(flags, """Flags field.""")
+        int flags()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            return pdf_annot_flags(gctx, annot);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation clean contents
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_cleanContents, !result)
+        PARENTCHECK(_cleanContents, """Clean appearance contents object.""")
+        PyObject *_cleanContents()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            pdf_filter_options filter = {
+                NULL,  // opaque
+                NULL,  // image filter
+                NULL,  // text filter
+                NULL,  // after text
+                NULL,  // end page
+                1,     // recurse: true
+                1,     // instance forms
+                1,     // only sanitize, no filtering
+                0      // do not ascii-escape binary data
+                }; 
+            fz_try(gctx) {
+                pdf_filter_annot_contents(gctx, annot->page->doc, annot, &filter);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf_dirty_annot(gctx, annot);
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // set annotation flags
+        //---------------------------------------------------------------------
+        PARENTCHECK(setFlags, """Set annotation flags.""")
+        void setFlags(int flags)
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            pdf_set_annot_flags(gctx, annot, flags);
+        }
+
+        //---------------------------------------------------------------------
+        // annotation delete responses
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(delete_responses, !result)
+        PARENTCHECK(delete_responses, """Delete responding annotations.""")
+        PyObject *delete_responses()
+        {
+            pdf_annot *annot = (pdf_annot *) $self;
+            pdf_page *page = annot->page;
+            pdf_annot *irt_annot = NULL;
+            fz_try(gctx) {
+                while (1) {
+                    irt_annot = JM_find_annot_irt(gctx, annot);
+                    if (!irt_annot)
+                        break;
+                    JM_delete_annot(gctx, page, irt_annot);
+                }
+                pdf_dict_del(gctx, annot->obj, PDF_NAME(Popup));
+                pdf_obj *annots = pdf_dict_get(gctx, page->obj, PDF_NAME(Annots));
+                int i, n = pdf_array_len(gctx, annots), found = 0;
+                for (i = n - 1; i >= 0; i--) {
+                    pdf_obj *o = pdf_array_get(gctx, annots, i);
+                    pdf_obj *p = pdf_dict_get(gctx, o, PDF_NAME(Parent));
+                    if (!p)
+                        continue;
+                    if (!pdf_objcmp(gctx, p, annot->obj)) {
+                        pdf_array_delete(gctx, annots, i);
+                        found = 1;
+                    }
+                }
+                if (found > 0) {
+                    pdf_dict_put(gctx, page->obj, PDF_NAME(Annots), annots);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            pdf_dirty_annot(gctx, annot);
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // next annotation
+        //---------------------------------------------------------------------
+        PARENTCHECK(next, """Next annotation.""")
+        %pythonappend next %{
+        if not val:
+            return None
+        val.thisown = True
+        val.parent = self.parent  # copy owning page object from previous annot
+        val.parent._annot_refs[id(val)] = val
+
+        if val.type[0] == PDF_ANNOT_WIDGET:
+            widget = Widget()
+            TOOLS._fill_widget(val, widget)
+            val = widget
+        %}
+        %pythoncode %{@property%}
+        struct Annot *next()
+        {
+            pdf_annot *this_annot = (pdf_annot *) $self;
+            int type = pdf_annot_type(gctx, this_annot);
+            pdf_annot *annot;
+
+            if (type != PDF_ANNOT_WIDGET)
+            {
+                annot = pdf_next_annot(gctx, this_annot);
+            }
+            else
+            {
+                annot = (pdf_widget *) pdf_next_widget(gctx, (pdf_widget *) this_annot);
+            }
+
+            if (annot)
+                pdf_keep_annot(gctx, annot);
+            return (struct Annot *) annot;
+        }
+
+
+        //---------------------------------------------------------------------
+        // annotation pixmap
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(getPixmap, !result)
+        PARENTCHECK(getPixmap, """Annotation Pixmap.""")
+        %pythonprepend getPixmap
+%{"""Annotation Pixmap."""
+
+CheckParent(self)
+cspaces = {"gray": csGRAY, "rgb": csRGB, "cmyk": csCMYK}
+if type(colorspace) is str:
+    colorspace = cspaces.get(colorspace.lower(), None)
+%}
+        struct Pixmap *
+        getPixmap(PyObject *matrix = NULL, struct Colorspace *colorspace = NULL, int alpha = 0)
+        {
+            fz_matrix ctm = JM_matrix_from_py(matrix);
+            fz_colorspace *cs = (fz_colorspace *) colorspace;
+            fz_pixmap *pix = NULL;
+            if (!cs) {
+                cs = fz_device_rgb(gctx);
+            }
+
+            fz_try(gctx) {
+                pix = pdf_new_pixmap_from_annot(gctx, (pdf_annot *) $self, ctm, cs, NULL, alpha);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pix;
+        }
+
+
+        %pythoncode %{
+        def _erase(self):
+            try:
+                self.parent._forget_annot(self)
+            except:
+                return
+            if getattr(self, "thisown", False):
+                self.__swig_destroy__(self)
+                self.thisown = False
+            self.parent = None
+
+        def __str__(self):
+            CheckParent(self)
+            return "'%s' annotation on %s" % (self.type[1], str(self.parent))
+
+        def __repr__(self):
+            CheckParent(self)
+            return "'%s' annotation on %s" % (self.type[1], str(self.parent))
+
+        def __del__(self):
+            if self.parent is None:
+                retturn
+            self._erase()%}
+    }
+};
+%clearnodefaultctor;
+
+//-----------------------------------------------------------------------------
+// fz_link
+//-----------------------------------------------------------------------------
+%nodefaultctor;
+struct Link
+{
+    %immutable;
+    %extend {
+        ~Link() {
+            DEBUGMSG1("Link");
+            fz_drop_link(gctx, (fz_link *) $self);
+            DEBUGMSG2;
+        }
+
+        PyObject *_border(struct Document *doc, int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) doc);
+            if (!pdf) return_none;
+            pdf_obj *link_obj = pdf_new_indirect(gctx, pdf, xref, 0);
+            if (!link_obj) return_none;
+            PyObject *b = JM_annot_border(gctx, link_obj);
+            pdf_drop_obj(gctx, link_obj);
+            return b;
+        }
+
+        PyObject *_setBorder(PyObject *border, struct Document *doc, int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) doc);
+            if (!pdf) return_none;
+            pdf_obj *link_obj = pdf_new_indirect(gctx, pdf, xref, 0);
+            if (!link_obj) return_none;
+            PyObject *b = JM_annot_set_border(gctx, border, pdf, link_obj);
+            pdf_drop_obj(gctx, link_obj);
+            return b;
+        }
+
+        PyObject *_colors(struct Document *doc, int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) doc);
+            if (!pdf) return_none;
+            pdf_obj *link_obj = pdf_new_indirect(gctx, pdf, xref, 0);
+            if (!link_obj) return_none;
+            PyObject *b = JM_annot_colors(gctx, link_obj);
+            pdf_drop_obj(gctx, link_obj);
+            return b;
+        }
+
+        PyObject *_setColors(PyObject *colors, struct Document *doc, int xref)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) doc);
+            pdf_obj *arr = NULL;
+            int i;
+            if (!pdf) return_none;
+            if (!PyDict_Check(colors)) return_none;
+            float scol[4] = {0.0f, 0.0f, 0.0f, 0.0f};
+            int nscol = 0;
+            float fcol[4] = {0.0f, 0.0f, 0.0f, 0.0f};
+            int nfcol = 0;
+            PyObject *stroke = PyDict_GetItem(colors, dictkey_stroke);
+            PyObject *fill = PyDict_GetItem(colors, dictkey_fill);
+            JM_color_FromSequence(stroke, &nscol, scol);
+            JM_color_FromSequence(fill, &nfcol, fcol);
+            if (!nscol && !nfcol) return_none;
+            pdf_obj *link_obj = pdf_new_indirect(gctx, pdf, xref, 0);
+            if (!link_obj) return_none;
+            if (nscol > 0)
+            {
+                arr = pdf_new_array(gctx, pdf, nscol);
+                for (i = 0; i < nscol; i++)
+                    pdf_array_push_real(gctx, arr, scol[i]);
+                pdf_dict_put_drop(gctx, link_obj, PDF_NAME(C), arr);
+            }
+            if (nfcol > 0) JM_Warning("this annot type has no fill color)");
+            pdf_drop_obj(gctx, link_obj);
+            return_none;
+        }
+
+        %pythoncode %{
+        @property
+        def border(self):
+            return self._border(self.parent.parent.this, self.xref)
+
+        def setBorder(self, border=None, width=0, dashes=None, style=None):
+            if type(border) is not dict:
+                border = {"width": width, "style": style, "dashes": dashes}
+            return self._setBorder(border, self.parent.parent.this, self.xref)
+
+        @property
+        def colors(self):
+            return self._colors(self.parent.parent.this, self.xref)
+
+        def setColors(self, colors=None, stroke=None, fill=None):
+            if type(colors) is not dict:
+                colors = {"fill": fill, "stroke": stroke}
+            return self._setColors(colors, self.parent.parent.this, self.xref)
+        %}
+        %pythoncode %{@property%}
+        PARENTCHECK(uri, """Uri string.""")
+        PyObject *uri()
+        {
+            fz_link *this_link = (fz_link *) $self;
+            return JM_UnicodeFromStr(this_link->uri);
+        }
+
+        %pythoncode %{@property%}
+        PARENTCHECK(isExternal, """External indicator.""")
+        PyObject *isExternal()
+        {
+            fz_link *this_link = (fz_link *) $self;
+            if (!this_link->uri) Py_RETURN_FALSE;
+            return JM_BOOL(fz_is_external_link(gctx, this_link->uri));
+        }
+
+        %pythoncode
+        %{
+        page = -1
+        @property
+        def dest(self):
+            """Create link destination details."""
+            if hasattr(self, "parent") and self.parent is None:
+                raise ValueError("orphaned object: parent is None")
+            if self.parent.parent.isClosed or self.parent.parent.isEncrypted:
+                raise ValueError("document closed or encrypted")
+            doc = self.parent.parent
+
+            if self.isExternal or self.uri.startswith("#"):
+                uri = None
+            else:
+                uri = doc.resolveLink(self.uri)
+
+            return linkDest(self, uri)
+        %}
+
+        PARENTCHECK(rect, """Rectangle ('hot area').""")
+        %pythoncode %{@property%}
+        %pythonappend rect %{val = Rect(val)%}
+        PyObject *rect()
+        {
+            fz_link *this_link = (fz_link *) $self;
+            return JM_py_from_rect(this_link->rect);
+        }
+
+        //---------------------------------------------------------------------
+        // next link
+        //---------------------------------------------------------------------
+        // we need to increase the link refs number
+        // so that it will not be freed when the head is dropped
+        PARENTCHECK(next, """Next link.""")
+        %pythonappend next %{
+            if val:
+                val.thisown = True
+                val.parent = self.parent  # copy owning page from prev link
+                val.parent._annot_refs[id(val)] = val
+                if self.xref > 0:  # prev link has an xref
+                    link_xrefs = self.parent._getLinkXrefs()
+                    idx = link_xrefs.index(self.xref)
+                    val.xref = link_xrefs[idx + 1]
+                else:
+                    val.xref = 0
+        %}
+        %pythoncode %{@property%}
+        struct Link *next()
+        {
+            fz_link *this_link = (fz_link *) $self;
+            fz_link *next_link = this_link->next;
+            if (!next_link) return NULL;
+            next_link = fz_keep_link(gctx, next_link);
+            return (struct Link *) next_link;
+        }
+
+        %pythoncode %{
+        def _erase(self):
+            try:
+                self.parent._forget_annot(self)
+            except:
+                pass
+            if getattr(self, "thisown", False):
+                self.__swig_destroy__(self)
+            self.parent = None
+            self.thisown = False
+
+        def __str__(self):
+            CheckParent(self)
+            return "link on " + str(self.parent)
+
+        def __repr__(self):
+            CheckParent(self)
+            return "link on " + str(self.parent)
+
+        def __del__(self):
+            self._erase()%}
+
+    }
+};
+%clearnodefaultctor;
+
+//-----------------------------------------------------------------------------
+// fz_display_list
+//-----------------------------------------------------------------------------
+struct DisplayList {
+    %extend
+    {
+        ~DisplayList() {
+            DEBUGMSG1("DisplayList");
+            fz_drop_display_list(gctx, (fz_display_list *) $self);
+            DEBUGMSG2;
+        }
+        FITZEXCEPTION(DisplayList, !result)
+        DisplayList(PyObject *mediabox)
+        {
+            fz_display_list *dl = NULL;
+            fz_try(gctx) {
+                dl = fz_new_display_list(gctx, JM_rect_from_py(mediabox));
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct DisplayList *) dl;
+        }
+
+        FITZEXCEPTION(run, !result)
+        PyObject *run(struct DeviceWrapper *dw, PyObject *m, PyObject *area) {
+            fz_try(gctx) {
+                fz_run_display_list(gctx, (fz_display_list *) $self, dw->device,
+                    JM_matrix_from_py(m), JM_rect_from_py(area), NULL);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // DisplayList.rect
+        //---------------------------------------------------------------------
+        %pythoncode%{@property%}
+        %pythonappend rect %{val = Rect(val)%}
+        PyObject *rect()
+        {
+            return JM_py_from_rect(fz_bound_display_list(gctx, (fz_display_list *) $self));
+        }
+
+        //---------------------------------------------------------------------
+        // DisplayList.getPixmap
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(getPixmap, !result)
+        struct Pixmap *getPixmap(PyObject *matrix=NULL,
+                                      struct Colorspace *colorspace=NULL,
+                                      int alpha=1,
+                                      PyObject *clip=NULL)
+        {
+            fz_colorspace *cs = NULL;
+            fz_pixmap *pix = NULL;
+
+            if (colorspace) cs = (fz_colorspace *) colorspace;
+            else cs = fz_device_rgb(gctx);
+
+            fz_try(gctx) {
+                pix = JM_pixmap_from_display_list(gctx,
+                          (fz_display_list *) $self, matrix, cs,
+                           alpha, clip, NULL);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Pixmap *) pix;
+        }
+
+        //---------------------------------------------------------------------
+        // DisplayList.getTextPage
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(getTextPage, !result)
+        struct TextPage *getTextPage(int flags = 3)
+        {
+            fz_display_list *this_dl = (fz_display_list *) $self;
+            fz_stext_page *tp = NULL;
+            fz_try(gctx) {
+                fz_stext_options stext_options = { 0 };
+                stext_options.flags = flags;
+                tp = fz_new_stext_page_from_display_list(gctx, this_dl, &stext_options);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct TextPage *) tp;
+        }
+        %pythoncode %{
+        def __del__(self):
+            if not type(self) is DisplayList: return
+            self.__swig_destroy__(self)
+        %}
+    }
+};
+
+//-----------------------------------------------------------------------------
+// fz_stext_page
+//-----------------------------------------------------------------------------
+struct TextPage {
+    %extend {
+        ~TextPage()
+        {
+            DEBUGMSG1("TextPage");
+            fz_drop_stext_page(gctx, (fz_stext_page *) $self);
+            DEBUGMSG2;
+        }
+
+        FITZEXCEPTION(TextPage, !result)
+        TextPage(PyObject *mediabox)
+        {
+            fz_stext_page *tp = NULL;
+            fz_try(gctx) {
+                tp = fz_new_stext_page(gctx, JM_rect_from_py(mediabox));
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct TextPage *) tp;
+        }
+
+        //---------------------------------------------------------------------
+        // method search()
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(search, !result)
+        %pythonprepend search
+        %{"""Locate up to 'hit_max' 'needle' occurrences returning rects or quads."""%}
+        %pythonappend search %{
+        if not val:
+            return val
+        newval = []
+        for v in val:
+            q = Quad(v)
+            if quads:
+                newval.append(q)
+            else:
+                newval.append(q.rect)
+        val = newval
+        %}
+        PyObject *search(const char *needle, int hit_max=16, int quads=1)
+        {
+            fz_quad *result = NULL;
+            PyObject *liste = NULL;
+            int i, mymax = hit_max;
+            if (mymax < 1) mymax = 16;
+            fz_try(gctx) {
+                liste = PyList_New(0);
+                result = JM_Alloc(fz_quad, (mymax + 1));
+                fz_quad *quad = (fz_quad *) result;
+                int count = fz_search_stext_page(gctx, (fz_stext_page *) $self, needle, result, hit_max);
+                for (i = 0; i < count; i++) {
+                    LIST_APPEND_DROP(liste, JM_py_from_quad(*quad));
+                    quad += 1;
+                }
+            }
+            fz_always(gctx) {
+                JM_Free(result);
+            }
+            fz_catch(gctx) {
+                Py_CLEAR(liste);
+                return PyList_New(0);
+            }
+            return liste;
+        }
+
+
+        //---------------------------------------------------------------------
+        // Get list of all blocks with block type and bbox as a Python list
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_getNewBlockList, !result)
+        PyObject *
+        _getNewBlockList(PyObject *page_dict, int raw)
+        {
+            fz_try(gctx) {
+                JM_make_textpage_dict(gctx, (fz_stext_page *) $self, page_dict, raw);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        %pythoncode %{
+        def _textpage_dict(self, raw = False):
+            page_dict = {"width": self.rect.width, "height": self.rect.height}
+            self._getNewBlockList(page_dict, raw)
+            return page_dict
+        %}
+
+
+        //---------------------------------------------------------------------
+        // Get text blocks with their bbox and concatenated lines
+        // as a Python list
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(extractBLOCKS, !result)
+        %pythonprepend extractBLOCKS
+        %{"""Fill a given list with text block information."""%}
+        PyObject *
+        extractBLOCKS(PyObject *lines)
+        {
+            fz_stext_block *block;
+            fz_stext_line *line;
+            fz_stext_char *ch;
+            int block_n = 0;
+            PyObject *text = NULL, *litem;
+            fz_buffer *res = NULL;
+            fz_var(res);
+            fz_stext_page *this_tpage = (fz_stext_page *) $self;
+
+            fz_try(gctx) {
+                res = fz_new_buffer(gctx, 1024);
+                for (block = this_tpage->first_block; block; block = block->next) {
+                    fz_rect blockrect = block->bbox;
+                    if (block->type == FZ_STEXT_BLOCK_TEXT) {
+                        fz_clear_buffer(gctx, res);  // set text buffer to empty
+                        int line_n = 0;
+                        float last_y0 = 0.0;
+                        for (line = block->u.t.first_line; line; line = line->next) {
+                            fz_rect linerect = line->bbox;
+                            // append line no. 2 with new-line
+                            if (line_n > 0) {
+                                if (linerect.y0 != last_y0)
+                                    fz_append_string(gctx, res, "\n");
+                                else
+                                    fz_append_string(gctx, res, " ");
+                            }
+                            last_y0 = linerect.y0;
+                            line_n++;
+                            for (ch = line->first_char; ch; ch = ch->next) {
+                                JM_append_rune(gctx, res, ch->c);
+                                linerect = fz_union_rect(linerect, JM_char_bbox(line, ch));
+                            }
+                            blockrect = fz_union_rect(blockrect, linerect);
+                        }
+                        text = JM_EscapeStrFromBuffer(gctx, res);
+                    } else {
+                        fz_image *img = block->u.i.image;
+                        fz_colorspace *cs = img->colorspace;
+                        text = PyUnicode_FromFormat("<image: %s, width %d, height %d, bpc %d>", fz_colorspace_name(gctx, cs), img->w, img->h, img->bpc);
+                        blockrect = fz_union_rect(blockrect, block->bbox);
+                    }
+                    litem = PyTuple_New(7);
+                    PyTuple_SET_ITEM(litem, 0, Py_BuildValue("f", blockrect.x0));
+                    PyTuple_SET_ITEM(litem, 1, Py_BuildValue("f", blockrect.y0));
+                    PyTuple_SET_ITEM(litem, 2, Py_BuildValue("f", blockrect.x1));
+                    PyTuple_SET_ITEM(litem, 3, Py_BuildValue("f", blockrect.y1));
+                    PyTuple_SET_ITEM(litem, 4, Py_BuildValue("O", text));
+                    PyTuple_SET_ITEM(litem, 5, Py_BuildValue("i", block_n));
+                    PyTuple_SET_ITEM(litem, 6, Py_BuildValue("i", block->type));
+                    LIST_APPEND_DROP(lines, litem);
+                    Py_DECREF(text);
+                    block_n++;
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                PyErr_Clear();
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // Get text words with their bbox
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(extractWORDS, !result)
+        %pythonprepend extractWORDS
+        %{"""Fill a list with text word information."""%}
+        PyObject *
+        extractWORDS(PyObject *lines)
+        {
+            fz_stext_block *block;
+            fz_stext_line *line;
+            fz_stext_char *ch;
+            fz_buffer *buff = NULL;
+            fz_var(buff);
+            size_t buflen = 0;
+            int block_n = 0, line_n, word_n;
+            fz_rect wbbox = {0,0,0,0};  // word bbox
+            fz_stext_page *this_tpage = (fz_stext_page *) $self;
+
+            fz_try(gctx) {
+                buff = fz_new_buffer(gctx, 64);
+                for (block = this_tpage->first_block; block; block = block->next) {
+                    if (block->type != FZ_STEXT_BLOCK_TEXT) {
+                        block_n++;
+                        continue;
+                    }
+                    line_n = 0;
+                    for (line = block->u.t.first_line; line; line = line->next) {
+                        word_n = 0;                       // word counter per line
+                        fz_clear_buffer(gctx, buff);      // reset word buffer
+                        buflen = 0;                       // reset char counter
+                        for (ch = line->first_char; ch; ch = ch->next) {
+                            if (ch->c == 32 && buflen == 0)
+                                continue;                 // skip spaces at line start
+                            if (ch->c == 32) {
+                                word_n = JM_append_word(gctx, lines, buff, &wbbox,
+                                                        block_n, line_n, word_n);
+                                fz_clear_buffer(gctx, buff);
+                                buflen = 0;               // reset char counter
+                                continue;
+                            }
+                            // append one unicode character to the word
+                            JM_append_rune(gctx, buff, ch->c);
+                            buflen++;
+                            // enlarge word bbox
+                            wbbox = fz_union_rect(wbbox, JM_char_bbox(line, ch));
+                        }
+                        if (buflen) {
+                            word_n = JM_append_word(gctx, lines, buff, &wbbox,
+                                                    block_n, line_n, word_n);
+                            fz_clear_buffer(gctx, buff);
+                            buflen = 0;
+                        }
+                        line_n++;
+                    }
+                    block_n++;
+                }
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, buff);
+                PyErr_Clear();
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+        //---------------------------------------------------------------------
+        // TextPage rectangle
+        //---------------------------------------------------------------------
+        %pythoncode %{@property%}
+        %pythonprepend rect
+        %{"""Page rectangle."""%}
+        %pythonappend rect %{val = Rect(val)%}
+        PyObject *rect()
+        {
+            fz_stext_page *this_tpage = (fz_stext_page *) $self;
+            fz_rect mediabox = this_tpage->mediabox;
+            return JM_py_from_rect(mediabox);
+        }
+        //---------------------------------------------------------------------
+        // method _extractText()
+        //---------------------------------------------------------------------
+        FITZEXCEPTION(_extractText, !result)
+        %newobject _extractText;
+        PyObject *_extractText(int format)
+        {
+            fz_buffer *res = NULL;
+            fz_output *out = NULL;
+            PyObject *text = NULL;
+            fz_var(res);
+            fz_var(out);
+            fz_stext_page *this_tpage = (fz_stext_page *) $self;
+            fz_try(gctx) {
+                res = fz_new_buffer(gctx, 1024);
+                out = fz_new_output_with_buffer(gctx, res);
+                switch(format) {
+                    case(1):
+                        fz_print_stext_page_as_html(gctx, out, this_tpage, 0);
+                        break;
+                    case(3):
+                        fz_print_stext_page_as_xml(gctx, out, this_tpage, 0);
+                        break;
+                    case(4):
+                        fz_print_stext_page_as_xhtml(gctx, out, this_tpage, 0);
+                        break;
+                    default:
+                        JM_print_stext_page_as_text(gctx, out, this_tpage);
+                        text = JM_EscapeStrFromBuffer(gctx, res);
+                        break;
+                }
+                if (!text) text = JM_EscapeStrFromBuffer(gctx, res);
+
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+                fz_drop_output(gctx, out);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return text;
+        }
+        %pythoncode %{
+            def extractText(self):
+                """Return simple, bare text on the page."""
+                return self._extractText(0)
+
+            def extractHTML(self):
+                """Return page content as a HTML string."""
+                return self._extractText(1)
+
+            def extractJSON(self):
+                """Return 'extractDICT' converted to JSON format."""
+                import base64, json
+                val = self._textpage_dict(raw=False)
+
+                class b64encode(json.JSONEncoder):
+                    def default(self,s):
+                        if not fitz_py2 and type(s) is bytes:
+                            return base64.b64encode(s).decode()
+                        if type(s) is bytearray:
+                            if fitz_py2:
+                                return base64.b64encode(s)
+                            else:
+                                return base64.b64encode(s).decode()
+
+                val = json.dumps(val, separators=(",", ":"), cls=b64encode, indent=1)
+
+                return val
+
+            def extractXML(self):
+                """Return page content as a XML string."""
+                return self._extractText(3)
+
+            def extractXHTML(self):
+                """Return page content as a XHTML string."""
+                return self._extractText(4)
+
+            def extractDICT(self):
+                """Return page content as a Python dict of images and text spans."""
+                return self._textpage_dict(raw=False)
+
+            def extractRAWDICT(self):
+                """Return page content as a Python dict of images and text characters."""
+                return self._textpage_dict(raw=True)
+
+            def __del__(self):
+                if not type(self) is TextPage: return
+                self.__swig_destroy__(self)
+        %}
+    }
+};
+
+//-----------------------------------------------------------------------------
+// Graftmap - only internally used for optimizing PDF object copy operations
+//-----------------------------------------------------------------------------
+struct Graftmap
+{
+    %extend
+    {
+        ~Graftmap()
+        {
+            DEBUGMSG1("Graftmap");
+            pdf_drop_graft_map(gctx, (pdf_graft_map *) $self);
+            DEBUGMSG2;
+        }
+
+        FITZEXCEPTION(Graftmap, !result)
+        Graftmap(struct Document *doc)
+        {
+            pdf_graft_map *map = NULL;
+            fz_try(gctx) {
+                pdf_document *dst = pdf_specifics(gctx, (fz_document *) doc);
+                ASSERT_PDF(dst);
+                map = pdf_new_graft_map(gctx, dst);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Graftmap *) map;
+        }
+        %pythoncode %{
+        def __del__(self):
+            if not type(self) is Graftmap:
+                return
+            self.__swig_destroy__(self)
+        %}
+    }
+};
+
+
+//-----------------------------------------------------------------------------
+// TextWriter
+//-----------------------------------------------------------------------------
+struct TextWriter
+{
+    %extend {
+        ~TextWriter()
+        {
+            DEBUGMSG1("TextWriter");
+            fz_drop_text(gctx, (fz_text *) $self);
+            DEBUGMSG2;
+        }
+
+        FITZEXCEPTION(TextWriter, !result)
+        %pythonprepend TextWriter
+        %{"""Stores text spans for later output on compatible PDF pages."""%}
+        %pythonappend TextWriter %{
+        self.opacity = opacity
+        self.color = color
+        self.rect = Rect(page_rect)
+        self.ctm = Matrix(1, 0, 0, -1, 0, self.rect.height)
+        self.ictm = ~self.ctm
+        self.lastPoint = Point()
+        self.lastPoint.__doc__ = "Position following last text insertion."
+        self.textRect = Rect(0, 0, -1, -1)
+        self.textRect.__doc__ = "Accumulated area of text spans."
+        self.used_fonts = set()
+        %}
+        TextWriter(PyObject *page_rect, int opacity=1, PyObject *color=NULL )
+        {
+            fz_text *text = NULL;
+            fz_try(gctx) {
+                text = fz_new_text(gctx);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct TextWriter *) text;
+        }
+
+        FITZEXCEPTION(append, !result)
+        %pythonprepend append %{
+        """Store 'text' at point 'pos' using 'font' and 'fontsize'."""
+
+        pos = Point(pos) * self.ictm
+        if font is None:
+            font = Font("helv")%}
+        %pythonappend append %{
+        self.lastPoint = Point(val[-2:]) * self.ctm
+        self.textRect = self._bbox * self.ctm
+        val = self.textRect, self.lastPoint
+        if font.flags["mono"] == 1:
+            self.used_fonts.add(font)
+        %}
+        PyObject *
+        append(PyObject *pos, char *text, struct Font *font=NULL, float fontsize=11, char *language=NULL, int wmode=0, int bidi_level=0)
+        {
+            fz_text_language lang = fz_text_language_from_string(language);
+            fz_bidi_direction markup_dir = 0;
+            fz_point p = JM_point_from_py(pos);
+            fz_matrix trm = fz_make_matrix(fontsize, 0, 0, fontsize, p.x, p.y);
+            fz_try(gctx) {
+                trm = fz_show_string(gctx, (fz_text *) $self, (fz_font *) font, trm, text, wmode, bidi_level, markup_dir, lang);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return JM_py_from_matrix(trm);
+        }
+
+        %pythoncode %{@property%}
+        %pythonappend _bbox%{val = Rect(val)%}
+        PyObject *_bbox()
+        {
+            return JM_py_from_rect(fz_bound_text(gctx, (fz_text *) $self, NULL, fz_identity));
+        }
+
+        FITZEXCEPTION(writeText, !result)
+        %pythonprepend writeText%{
+        """Write the text to a PDF page having the TextWriter's page size.
+
+        Args:
+            page: a PDF page having same size.
+            color: override text color.
+            opacity: override transparency.
+            overlay: put in foreground or background.
+            morph: tuple(Point, Matrix), apply Matrix with fixpoint Point.
+            render_mode: (int) PDF render mode operator 'Tr'.
+        """
+
+        CheckParent(page)
+        if abs(self.rect - page.rect) > 1e-3:
+            raise ValueError("incompatible page rect")
+        if morph != None:
+            if (type(morph) not in (tuple, list)
+                or type(morph[0]) is not Point
+                or type(morph[1]) is not Matrix
+                ):
+                raise ValueError("morph must be (Point, Matrix) or None")
+        if getattr(opacity, "__float__", None) is None:
+            opacity = self.opacity
+        if color is None:
+            color = self.color
+        %}
+        %pythonappend writeText%{
+        max_nums = val[0]
+        content = val[1]
+        max_alp, max_font = max_nums
+        old_cont_lines = content.splitlines()
+
+        new_cont_lines = ["q"]
+
+        if morph:
+            p = morph[0] * self.ictm
+            delta = Matrix(1, 1).preTranslate(p.x, p.y)
+            matrix = ~delta * morph[1] * delta
+            new_cont_lines.append("%g %g %g %g %g %g cm" % JM_TUPLE(matrix))
+
+        for line in old_cont_lines:
+            if line.endswith(" cm"):
+                continue
+            if line == "BT":
+                new_cont_lines.append(line)
+                new_cont_lines.append("%i Tr" % render_mode)
+                continue
+            if line.endswith(" gs"):
+                alp = int(line.split()[0][4:]) + max_alp
+                line = "/Alp%i gs" % alp
+            elif line.endswith(" Tf"):
+                temp = line.split()
+                font = int(temp[0][2:]) + max_font
+                line = " ".join(["/F%i" % font] + temp[1:])
+            elif line.endswith(" rg"):
+                new_cont_lines.append(line.replace("rg", "RG"))
+            elif line.endswith(" g"):
+                new_cont_lines.append(line.replace(" g", " G"))
+            elif line.endswith(" k"):
+                new_cont_lines.append(line.replace(" k", " K"))
+            new_cont_lines.append(line)
+        new_cont_lines.append("Q\n")
+        content = "\n".join(new_cont_lines).encode("utf-8")
+        TOOLS._insert_contents(page, content, overlay=overlay)
+        val = None
+        for font in self.used_fonts:
+            repair_mono_font(page, font)
+        %}
+        PyObject *writeText(struct Page *page, PyObject *color=NULL, float opacity=-1, int overlay=1, PyObject *morph=NULL, int render_mode=0)
+        {
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) page);
+            fz_rect mediabox = fz_bound_page(gctx, (fz_page *) page);
+            pdf_obj *resources = NULL;
+            fz_buffer *contents = NULL;
+            fz_device *dev = NULL;
+            PyObject *result = NULL, *max_nums, *cont_string;
+            float alpha = 1;
+            if (opacity >= 0 && opacity < 1)
+                alpha = opacity;
+            fz_colorspace *colorspace;
+            int ncol = 1;
+            float dev_color[4] = {0, 0, 0, 0};
+            if (color) JM_color_FromSequence(color, &ncol, dev_color);
+            switch(ncol) {
+                case 3: colorspace = fz_device_rgb(gctx); break;
+                case 4: colorspace = fz_device_cmyk(gctx); break;
+                default: colorspace = fz_device_gray(gctx); break;
+            }
+
+            fz_try(gctx) {
+                ASSERT_PDF(pdfpage);
+                resources = pdf_new_dict(gctx, pdfpage->doc, 5);
+                contents = fz_new_buffer(gctx, 1024);
+                dev = pdf_new_pdf_device(gctx, pdfpage->doc, fz_identity,
+                            mediabox, resources, contents);
+                fz_fill_text(gctx, dev, (fz_text *) $self, fz_identity,
+                    colorspace, dev_color, alpha, fz_default_color_params);
+                fz_close_device(gctx, dev);
+
+                // copy generated resources into the one of the page
+                max_nums = JM_merge_resources(gctx, pdfpage, resources);
+                cont_string = JM_EscapeStrFromBuffer(gctx, contents);
+                result = Py_BuildValue("OO", max_nums, cont_string);
+                Py_DECREF(cont_string);
+                Py_DECREF(max_nums);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, contents);
+                pdf_drop_obj(gctx, resources);
+                fz_drop_device(gctx, dev);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return result;
+        }
+        %pythoncode %{
+        def __del__(self):
+            if not type(self) is TextWriter:
+                return
+            self.__swig_destroy__(self)
+        %}
+    }
+};
+
+
+//-----------------------------------------------------------------------------
+// Font
+//-----------------------------------------------------------------------------
+struct Font
+{
+    %extend
+    {
+        ~Font()
+        {
+            DEBUGMSG1("Font");
+            fz_drop_font(gctx, (fz_font *) $self);
+            DEBUGMSG2;
+        }
+
+        FITZEXCEPTION(Font, !result)
+        %pythonprepend Font %{
+        if fontname:
+            if "/" in fontname or "\\" in fontname:
+                print("Warning: did you mean fontfile?")
+            try:
+                ordering = ("china-t", "china-s", "japan", "korea","china-ts", "china-ss", "japan-s", "korea-s").index(fontname.lower()) % 4
+            except ValueError:
+                ordering = -1
+            if fontname.lower().startswith(("fig", "fim")):
+                try:
+                    import pymupdf_fonts  # optional fonts
+                    fontbuffer = pymupdf_fonts.myfont(fontname)[:]  # make a copy
+                    fontname = None  # ensure using fontbuffer only
+                    del pymupdf_fonts  # remove package again
+                except Exception as exc:
+                    if repr(exc).startswith(("ImportError", "AttributeError")):
+                        raise ImportError("Optional package 'pymupdf_fonts' not installed")
+                    else:
+                        raise exc
+            elif ordering < 0:
+                fontname = Base14_fontdict.get(fontname.lower(), fontname)
+        %}
+        Font(char *fontname=NULL, char *fontfile=NULL,
+             PyObject *fontbuffer=NULL, int script=0,
+             char *language=NULL, int ordering=-1, int is_bold=0,
+             int is_italic=0, int is_serif=0)
+        {
+            fz_font *font = NULL;
+            fz_try(gctx) {
+                fz_text_language lang = fz_text_language_from_string(language);
+                font = JM_get_font(gctx, fontname, fontfile,
+                           fontbuffer, script, lang, ordering,
+                           is_bold, is_italic, is_serif);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return (struct Font *) font;
+        }
+
+        %pythonprepend unicode_to_glyph_name
+        %{"""Return the glyph name of a unicode."""%}
+        PyObject *unicode_to_glyph_name(int c, char *language=NULL, int script=0)
+        {
+            fz_font *font;
+            fz_text_language lang = fz_text_language_from_string(language);
+            char name[32];
+            int gid = fz_encode_character_with_fallback(gctx, (fz_font *) $self, c, script, lang, &font);
+            fz_get_glyph_name(gctx, font, gid, name, sizeof(name));
+            return Py_BuildValue("s", name);
+        }
+
+
+        %pythonprepend glyph_name_to_unicode
+        %{"""Return the unicode for a glyph name."""%}
+        PyObject *glyph_name_to_unicode(const char *name)
+        {
+            return Py_BuildValue("i", fz_unicode_from_glyph_name(name));
+        }
+
+
+        %pythonprepend glyph_advance
+        %{"""Return the glyph width of a unicode."""%}
+        float glyph_advance(int chr, char *language=NULL, int script=0, int wmode=0)
+        {
+            fz_font *font;
+            fz_text_language lang = fz_text_language_from_string(language);
+            int gid = fz_encode_character_with_fallback(gctx, (fz_font *) $self, chr, script, lang, &font);
+            return fz_advance_glyph(gctx, font, gid, wmode);
+        }
+
+        %pythonprepend has_glyph
+        %{"""Return whether font has a glyph for this unicode."""%}
+        PyObject *has_glyph(int chr, char *language=NULL, int script=0)
+        {
+            fz_font *font;
+            fz_text_language lang = fz_text_language_from_string(language);
+            int gid = fz_encode_character_with_fallback(gctx, (fz_font *) $self, chr, script, lang, &font);
+            if (gid > 0) Py_RETURN_TRUE;
+            Py_RETURN_FALSE;
+        }
+
+
+        %pythoncode %{@property%}
+        PyObject *flags()
+        {
+            fz_font_flags_t *f = fz_font_flags((fz_font *) $self);
+            if (!f) Py_RETURN_NONE;
+            return Py_BuildValue("{s:i,s:i,s:i,s:i,s:i,s:i,s:i,s:i,s:i,s:i}",
+            "mono", f->is_mono, "serif", f->is_serif, "bold", f->is_bold,
+            "italic", f->is_italic, "substitute", f->ft_substitute,
+            "stretch", f->ft_stretch, "fake-bold", f->fake_bold,
+            "fake-italic", f->fake_italic, "opentype", f->has_opentype,
+            "invalid-bbox", f->invalid_bbox);
+        }
+
+
+        %pythoncode %{@property%}
+        PyObject *name()
+        {
+            return JM_UnicodeFromStr(fz_font_name(gctx, (fz_font *) $self));
+        }
+
+        %pythoncode %{@property%}
+        int glyph_count()
+        {
+            fz_font *this_font = (fz_font *) $self;
+            return this_font->glyph_count;
+        }
+
+        %pythoncode %{@property%}
+        %pythonappend bbox%{val = Rect(val)%}
+        PyObject *bbox()
+        {
+            fz_font *this_font = (fz_font *) $self;
+            return JM_py_from_rect(fz_font_bbox(gctx, this_font));
+        }
+
+        %pythoncode %{
+            def text_length(self, text, fontsize=11, wmode=0):
+                """Calculate the length of a string for this font."""
+                return fontsize * sum([self.glyph_advance(ord(c), wmode=wmode) for c in text])
+
+            def __repr__(self):
+                return "Font('%s')" % self.name
+
+            def __del__(self):
+                if type(self) is not Font:
+                    return None
+                self.__swig_destroy__(self)
+        %}
+    }
+};
+
+
+//-----------------------------------------------------------------------------
+// Tools - a collection of tools and utilities
+//-----------------------------------------------------------------------------
+struct Tools
+{
+    %extend
+    {
+        %pythonprepend gen_id
+        %{"""Return a unique positive integer."""%}
+        PyObject *gen_id()
+        {
+            JM_UNIQUE_ID += 1;
+            if (JM_UNIQUE_ID < 0) JM_UNIQUE_ID = 1;
+            return Py_BuildValue("i", JM_UNIQUE_ID);
+        }
+
+        FITZEXCEPTION(set_icc, !result)
+        %pythonprepend set_icc
+        %{"""Set ICC color handling on or off."""%}
+        PyObject *set_icc(int on=0)
+        {
+            fz_try(gctx) {
+                if (on) {
+                    if (FZ_ENABLE_ICC)
+                        fz_enable_icc(gctx);
+                    else
+                        THROWMSG("PyMuPDF generated without ICC components.");
+                }
+                else if (FZ_ENABLE_ICC) {
+                    fz_disable_icc(gctx);
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        %pythonprepend store_shrink
+        %{"""Free 'percent' of current store size."""%}
+        PyObject *store_shrink(int percent)
+        {
+            if (percent >= 100)
+            {
+                fz_empty_store(gctx);
+                return Py_BuildValue("i", 0);
+            }
+            if (percent > 0) fz_shrink_store(gctx, 100 - percent);
+            return Py_BuildValue("i", (int) gctx->store->size);
+        }
+
+
+        %pythoncode%{@property%}
+        %pythonprepend store_size
+        %{"""MuPDF current store size."""%}
+        PyObject *store_size()
+        {
+            return Py_BuildValue("i", (int) gctx->store->size);
+        }
+
+
+        %pythoncode%{@property%}
+        %pythonprepend store_maxsize
+        %{"""MuPDF store size limit."""%}
+        PyObject *store_maxsize()
+        {
+            return Py_BuildValue("i", (int) gctx->store->max);
+        }
+
+
+        %pythonprepend show_aa_level
+        %{"""Show anti-aliasing values."""%}
+        %pythonappend show_aa_level %{
+        temp = {"graphics": val[0], "text": val[1], "graphics_min_line_width": val[2]}
+        val = temp%}
+        PyObject *show_aa_level()
+        {
+            return Py_BuildValue("iif",
+                fz_graphics_aa_level(gctx),
+                fz_text_aa_level(gctx),
+                fz_graphics_min_line_width(gctx));
+        }
+
+
+        %pythonprepend set_aa_level
+        %{"""Set anti-aliasing level."""%}
+        void set_aa_level(int level)
+        {
+            fz_set_aa_level(gctx, level);
+        }
+
+
+        %pythonprepend set_graphics_min_line_width
+        %{"""Set the graphics minimum line width."""%}
+        void set_graphics_min_line_width(float min_line_width)
+        {
+            fz_set_graphics_min_line_width(gctx, min_line_width);
+        }
+
+
+        %pythonprepend image_profile
+        %{"""Metadata of an image binary stream."""%}
+        PyObject *image_profile(PyObject *stream, int keep_image=0)
+        {
+            return JM_image_profile(gctx, stream, keep_image);
+        }
+
+
+        PyObject *_rotate_matrix(struct Page *page)
+        {
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) page);
+            if (!pdfpage) return JM_py_from_matrix(fz_identity);
+            return JM_py_from_matrix(JM_rotate_page_matrix(gctx, pdfpage));
+        }
+
+
+        PyObject *_derotate_matrix(struct Page *page)
+        {
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) page);
+            if (!pdfpage) return JM_py_from_matrix(fz_identity);
+            return JM_py_from_matrix(JM_derotate_page_matrix(gctx, pdfpage));
+        }
+
+
+        %pythoncode%{@property%}
+        %pythonprepend fitz_config
+        %{"""PyMuPDF configuration parameters."""%}
+        PyObject *fitz_config()
+        {
+            return JM_fitz_config();
+        }
+
+
+        %pythonprepend glyph_cache_empty
+        %{"""Empty the glyph cache."""%}
+        void glyph_cache_empty()
+        {
+            fz_purge_glyph_cache(gctx);
+        }
+
+
+        FITZEXCEPTION(_fill_widget, !result)
+        %pythonappend _fill_widget %{
+            widget.rect = Rect(annot.rect)
+            widget.xref = annot.xref
+            widget.parent = annot.parent
+            widget._annot = annot  # backpointer to annot object
+            if not widget.script:
+                widget.script = None
+            if not widget.script_stroke:
+                widget.script_stroke = None
+            if not widget.script_format:
+                widget.script_format = None
+            if not widget.script_change:
+                widget.script_change = None
+            if not widget.script_calc:
+                widget.script_calc = None
+        %}
+        PyObject *_fill_widget(struct Annot *annot, PyObject *widget)
+        {
+            fz_try(gctx) {
+                JM_get_widget_properties(gctx, (pdf_annot *) annot, widget);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        FITZEXCEPTION(_save_widget, !result)
+        PyObject *_save_widget(struct Annot *annot, PyObject *widget)
+        {
+            fz_try(gctx) {
+                JM_set_widget_properties(gctx, (pdf_annot *) annot, widget);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        FITZEXCEPTION(_reset_widget, !result)
+        PyObject *_reset_widget(struct Annot *annot)
+        {
+            fz_try(gctx) {
+                pdf_annot *this_annot = (pdf_annot *) annot;
+                pdf_document *pdf = pdf_get_bound_document(gctx, this_annot->obj);
+                pdf_field_reset(gctx, pdf, this_annot->obj);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        FITZEXCEPTION(_parse_da, !result)
+        %pythonappend _parse_da %{
+        if not val:
+            return ((0,), "", 0)
+        font = "Helv"
+        fsize = 12
+        col = (0, 0, 0)
+        dat = val.split()  # split on any whitespace
+        for i, item in enumerate(dat):
+            if item == "Tf":
+                font = dat[i - 2][1:]
+                fsize = float(dat[i - 1])
+                dat[i] = dat[i-1] = dat[i-2] = ""
+                continue
+            if item == "g":            # unicolor text
+                col = [(float(dat[i - 1]))]
+                dat[i] = dat[i-1] = ""
+                continue
+            if item == "rg":           # RGB colored text
+                col = [float(f) for f in dat[i - 3:i]]
+                dat[i] = dat[i-1] = dat[i-2] = dat[i-3] = ""
+                continue
+            if item == "k":           # CMYK colored text
+                col = [float(f) for f in dat[i - 4:i]]
+                dat[i] = dat[i-1] = dat[i-2] = dat[i-3] = dat[i-4] = ""
+                continue
+
+        val = (col, font, fsize)
+        %}
+        PyObject *_parse_da(struct Annot *annot)
+        {
+            char *da_str = NULL;
+            pdf_annot *this_annot = (pdf_annot *) annot;
+            fz_try(gctx) {
+                pdf_obj *da = pdf_dict_get_inheritable(gctx, this_annot->obj,
+                                                       PDF_NAME(DA));
+                if (!da) {
+                    pdf_obj *trailer = pdf_trailer(gctx, this_annot->page->doc);
+                    da = pdf_dict_getl(gctx, trailer, PDF_NAME(Root),
+                                       PDF_NAME(AcroForm),
+                                       PDF_NAME(DA),
+                                       NULL);
+                }
+                da_str = (char *) pdf_to_text_string(gctx, da);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return JM_UnicodeFromStr(da_str);
+        }
+
+
+        PyObject *_update_da(struct Annot *annot, char *da_str)
+        {
+            fz_try(gctx) {
+                pdf_annot *this_annot = (pdf_annot *) annot;
+                pdf_dict_put_text_string(gctx, this_annot->obj, PDF_NAME(DA), da_str);
+                pdf_dict_del(gctx, this_annot->obj, PDF_NAME(DS)); /* not supported */
+                pdf_dict_del(gctx, this_annot->obj, PDF_NAME(RC)); /* not supported */
+                pdf_dirty_annot(gctx, this_annot);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return_none;
+        }
+
+
+        FITZEXCEPTION(_get_all_contents, !result)
+        %pythonprepend _get_all_contents
+        %{"""Concatenate all /Contents objects of a page into a bytes object."""%}
+        PyObject *_get_all_contents(struct Page *fzpage)
+        {
+            pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) fzpage);
+            fz_buffer *res = NULL;
+            PyObject *result = NULL;
+            fz_try(gctx) {
+                ASSERT_PDF(page);
+                res = JM_read_contents(gctx, page->obj);
+                result = JM_BinFromBuffer(gctx, res);
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, res);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return result;
+        }
+
+
+        FITZEXCEPTION(_insert_contents, !result)
+        %pythonprepend _insert_contents
+        %{"""Add bytes as a new /Contents object for a page, and return its xref."""%}
+        PyObject *_insert_contents(struct Page *page, PyObject *newcont, int overlay=1)
+        {
+            fz_buffer *contbuf = NULL;
+            int xref = 0;
+            pdf_page *pdfpage = pdf_page_from_fz_page(gctx, (fz_page *) page);
+            fz_try(gctx) {
+                ASSERT_PDF(pdfpage);
+                contbuf = JM_BufferFromBytes(gctx, newcont);
+                xref = JM_insert_contents(gctx, pdfpage->doc, pdfpage->obj, contbuf, overlay);
+                pdfpage->doc->dirty = 1;
+            }
+            fz_always(gctx) {
+                fz_drop_buffer(gctx, contbuf);
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            return Py_BuildValue("i", xref);
+        }
+
+        %pythonprepend mupdf_version
+        %{"""Get version of MuPDF binary build."""%}
+        PyObject *mupdf_version()
+        {
+            return Py_BuildValue("s", FZ_VERSION);
+        }
+
+        %pythonprepend mupdf_warnings
+        %{"""Get the MuPDF warnings/errors with optional reset (default)."""%}
+        %pythonappend mupdf_warnings %{
+        val = "\n".join(val)
+        if reset:
+            self.reset_mupdf_warnings()%}
+        PyObject *mupdf_warnings(int reset=1)
+        {
+            Py_INCREF(JM_mupdf_warnings_store);
+            return JM_mupdf_warnings_store;
+        }
+
+        int _int_from_language(char *language)
+        {
+            return fz_text_language_from_string(language);
+        }
+
+        %pythonprepend reset_mupdf_warnings
+        %{"""Empty the MuPDF warnings/errors store."""%}
+        void reset_mupdf_warnings()
+        {
+            Py_CLEAR(JM_mupdf_warnings_store);
+            JM_mupdf_warnings_store = PyList_New(0);
+        }
+
+        %pythonprepend mupdf_display_errors
+        %{"""Set MuPDF error display to True or False."""%}
+        PyObject *mupdf_display_errors(PyObject *value = NULL)
+        {
+            if (value == Py_True)
+                JM_mupdf_show_errors = Py_True;
+            else if (value == Py_False)
+                JM_mupdf_show_errors = Py_False;
+            Py_INCREF(JM_mupdf_show_errors);
+            return JM_mupdf_show_errors;
+        }
+
+        PyObject *_transform_rect(PyObject *rect, PyObject *matrix)
+        {
+            return JM_py_from_rect(fz_transform_rect(JM_rect_from_py(rect), JM_matrix_from_py(matrix)));
+        }
+
+        PyObject *_intersect_rect(PyObject *r1, PyObject *r2)
+        {
+            return JM_py_from_rect(fz_intersect_rect(JM_rect_from_py(r1),
+                                                     JM_rect_from_py(r2)));
+        }
+
+        PyObject *_include_point_in_rect(PyObject *r, PyObject *p)
+        {
+            return JM_py_from_rect(fz_include_point_in_rect(JM_rect_from_py(r),
+                                                     JM_point_from_py(p)));
+        }
+
+        PyObject *_transform_point(PyObject *point, PyObject *matrix)
+        {
+            return JM_py_from_point(fz_transform_point(JM_point_from_py(point), JM_matrix_from_py(matrix)));
+        }
+
+        PyObject *_union_rect(PyObject *r1, PyObject *r2)
+        {
+            return JM_py_from_rect(fz_union_rect(JM_rect_from_py(r1),
+                                                 JM_rect_from_py(r2)));
+        }
+
+        PyObject *_concat_matrix(PyObject *m1, PyObject *m2)
+        {
+            return JM_py_from_matrix(fz_concat(JM_matrix_from_py(m1),
+                                               JM_matrix_from_py(m2)));
+        }
+
+        PyObject *_invert_matrix(PyObject *matrix)
+        {
+            fz_matrix src = JM_matrix_from_py(matrix);
+            float a = src.a;
+            float det = a * src.d - src.b * src.c;
+            if (det < -FLT_EPSILON || det > FLT_EPSILON)
+            {
+                fz_matrix dst;
+                float rdet = 1 / det;
+                dst.a = src.d * rdet;
+                dst.b = -src.b * rdet;
+                dst.c = -src.c * rdet;
+                dst.d = a * rdet;
+                a = -src.e * dst.a - src.f * dst.c;
+                dst.f = -src.e * dst.b - src.f * dst.d;
+                dst.e = a;
+                return Py_BuildValue("(i, O)", 0, JM_py_from_matrix(dst));
+            }
+            return Py_BuildValue("(i, ())", 1);
+        }
+
+
+        float _measure_string(const char *text, const char *fontname, float fontsize,
+                             int encoding = 0)
+        {
+            fz_font *font = fz_new_base14_font(gctx, fontname);
+            float w = 0;
+            while (*text)
+            {
+                int c, g;
+                text += fz_chartorune(&c, text);
+                switch (encoding)
+                {
+                    case PDF_SIMPLE_ENCODING_GREEK:
+                        c = fz_iso8859_7_from_unicode(c); break;
+                    case PDF_SIMPLE_ENCODING_CYRILLIC:
+                        c = fz_windows_1251_from_unicode(c); break;
+                    default:
+                        c = fz_windows_1252_from_unicode(c); break;
+                }
+                if (c < 0) c = 0xB7;
+                g = fz_encode_character(gctx, font, c);
+                w += fz_advance_glyph(gctx, font, g, 0);
+            }
+            return w * fontsize;
+        }
+
+        PyObject *
+        _sine_between(PyObject *C, PyObject *P, PyObject *Q)
+        {
+            // calculate the sine between lines CP and QP
+            fz_point c = JM_point_from_py(C);
+            fz_point p = JM_point_from_py(P);
+            fz_point q = JM_point_from_py(Q);
+            fz_point s = fz_normalize_vector(fz_make_point(q.x - p.x, q.y - p.y));
+            fz_matrix m1 = fz_make_matrix(1, 0, 0, 1, -p.x, -p.y);
+            fz_matrix m2 = fz_make_matrix(s.x, -s.y, s.y, s.x, 0, 0);
+            m1 = fz_concat(m1, m2);
+            c = fz_transform_point(c, m1);
+            c = fz_normalize_vector(c);
+            return Py_BuildValue("f", c.y);
+        }
+
+        PyObject *
+        _hor_matrix(PyObject *C, PyObject *P)
+        {
+            // calculate matrix m that maps the line from C to P to the x-axis,
+            // such that C * m = (0, 0), and the target line has same length.
+            fz_point c = JM_point_from_py(C);
+            fz_point p = JM_point_from_py(P);
+            fz_point s = fz_normalize_vector(fz_make_point(p.x - c.x, p.y - c.y));
+            fz_matrix m1 = fz_make_matrix(1, 0, 0, 1, -c.x, -c.y);
+            fz_matrix m2 = fz_make_matrix(s.x, -s.y, s.y, s.x, 0, 0);
+            return JM_py_from_matrix(fz_concat(m1, m2));
+        }
+
+
+        FITZEXCEPTION(set_font_width, !result)
+        PyObject *
+        set_font_width(struct Document *doc, int xref, int width)
+        {
+            pdf_document *pdf = pdf_specifics(gctx, (fz_document *) doc);
+            if (!pdf) Py_RETURN_FALSE;
+            pdf_obj *font=NULL, *dfonts=NULL;
+            fz_try(gctx) {
+                font = pdf_load_object(gctx, pdf, xref);
+                dfonts = pdf_dict_get(gctx, font, PDF_NAME(DescendantFonts));
+                if (pdf_is_array(gctx, dfonts)) {
+                    int i, n = pdf_array_len(gctx, dfonts);
+                    for (i = 0; i < n; i++) {
+                        pdf_obj *dfont = pdf_array_get(gctx, dfonts, i);
+                        pdf_obj *warray = pdf_new_array(gctx, pdf, 3);
+                        pdf_array_push(gctx, warray, pdf_new_int(gctx, 0));
+                        pdf_array_push(gctx, warray, pdf_new_int(gctx, 65532));
+                        pdf_array_push(gctx, warray, pdf_new_int(gctx, width));
+                        pdf_dict_put_drop(gctx, dfont, PDF_NAME(W), warray);
+                    }
+                }
+            }
+            fz_catch(gctx) {
+                return NULL;
+            }
+            Py_RETURN_TRUE;
+        }
+
+
+        %pythoncode %{
+def _le_annot_parms(self, annot, p1, p2, fill_color):
+    """Get common parameters for making line end symbols.
+
+    Returns:
+        m: matrix that maps p1, p2 to points L, P on the x-axis
+        im: its inverse
+        L, P: transformed p1, p2
+        w: line width
+        scol: stroke color string
+        fcol: fill color store_shrink
+        opacity: opacity string (gs command)
+    """
+    w = annot.border["width"]  # line width
+    sc = annot.colors["stroke"]  # stroke color
+    if not sc:  # black if missing
+        sc = (0,0,0)
+    scol = " ".join(map(str, sc)) + " RG\n"
+    if fill_color:
+        fc = fill_color
+    else:
+        fc = annot.colors["fill"]  # fill color
+    if not fc:
+        fc = (1,1,1)  # white if missing
+    fcol = " ".join(map(str, fc)) + " rg\n"
+    # nr = annot.rect
+    np1 = p1                   # point coord relative to annot rect
+    np2 = p2                   # point coord relative to annot rect
+    m = Matrix(self._hor_matrix(np1, np2))  # matrix makes the line horizontal
+    im = ~m                            # inverted matrix
+    L = np1 * m                        # converted start (left) point
+    R = np2 * m                        # converted end (right) point
+    if 0 <= annot.opacity < 1:
+        opacity = "/H gs\n"
+    else:
+        opacity = ""
+    return m, im, L, R, w, scol, fcol, opacity
+
+def _oval_string(self, p1, p2, p3, p4):
+    """Return /AP string defining an oval within a 4-polygon provided as points
+    """
+    def bezier(p, q, r):
+        f = "%f %f %f %f %f %f c\n"
+        return f % (p.x, p.y, q.x, q.y, r.x, r.y)
+
+    kappa = 0.55228474983              # magic number
+    ml = p1 + (p4 - p1) * 0.5          # middle points ...
+    mo = p1 + (p2 - p1) * 0.5          # for each ...
+    mr = p2 + (p3 - p2) * 0.5          # polygon ...
+    mu = p4 + (p3 - p4) * 0.5          # side
+    ol1 = ml + (p1 - ml) * kappa       # the 8 bezier
+    ol2 = mo + (p1 - mo) * kappa       # helper points
+    or1 = mo + (p2 - mo) * kappa
+    or2 = mr + (p2 - mr) * kappa
+    ur1 = mr + (p3 - mr) * kappa
+    ur2 = mu + (p3 - mu) * kappa
+    ul1 = mu + (p4 - mu) * kappa
+    ul2 = ml + (p4 - ml) * kappa
+    # now draw, starting from middle point of left side
+    ap = "%f %f m\n" % (ml.x, ml.y)
+    ap += bezier(ol1, ol2, mo)
+    ap += bezier(or1, or2, mr)
+    ap += bezier(ur1, ur2, mu)
+    ap += bezier(ul1, ul2, ml)
+    return ap
+
+def _le_diamond(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for diamond line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5             # 2*shift*width = length of square edge
+    d = shift * max(1, w)
+    M = R - (d/2., 0) if lr else L + (d/2., 0)
+    r = Rect(M, M) + (-d, -d, d, d)         # the square
+    # the square makes line longer by (2*shift - 1)*width
+    p = (r.tl + (r.bl - r.tl) * 0.5) * im
+    ap = "q\n%s%f %f m\n" % (opacity, p.x, p.y)
+    p = (r.tl + (r.tr - r.tl) * 0.5) * im
+    ap += "%f %f l\n"   % (p.x, p.y)
+    p = (r.tr + (r.br - r.tr) * 0.5) * im
+    ap += "%f %f l\n"   % (p.x, p.y)
+    p = (r.br + (r.bl - r.br) * 0.5) * im
+    ap += "%f %f l\n"   % (p.x, p.y)
+    ap += "%g w\n" % w
+    ap += scol + fcol + "b\nQ\n"
+    return ap
+
+def _le_square(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for square line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5             # 2*shift*width = length of square edge
+    d = shift * max(1, w)
+    M = R - (d/2., 0) if lr else L + (d/2., 0)
+    r = Rect(M, M) + (-d, -d, d, d)         # the square
+    # the square makes line longer by (2*shift - 1)*width
+    p = r.tl * im
+    ap = "q\n%s%f %f m\n" % (opacity, p.x, p.y)
+    p = r.tr * im
+    ap += "%f %f l\n"   % (p.x, p.y)
+    p = r.br * im
+    ap += "%f %f l\n"   % (p.x, p.y)
+    p = r.bl * im
+    ap += "%f %f l\n"   % (p.x, p.y)
+    ap += "%g w\n" % w
+    ap += scol + fcol + "b\nQ\n"
+    return ap
+
+def _le_circle(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for circle line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5             # 2*shift*width = length of square edge
+    d = shift * max(1, w)
+    M = R - (d/2., 0) if lr else L + (d/2., 0)
+    r = Rect(M, M) + (-d, -d, d, d)         # the square
+    ap = "q\n" + opacity + self._oval_string(r.tl * im, r.tr * im, r.br * im, r.bl * im)
+    ap += "%g w\n" % w
+    ap += scol + fcol + "b\nQ\n"
+    return ap
+
+def _le_butt(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for butt line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 3
+    d = shift * max(1, w)
+    M = R if lr else L
+    top = (M + (0, -d/2.)) * im
+    bot = (M + (0, d/2.)) * im
+    ap = "\nq\n%s%f %f m\n" % (opacity, top.x, top.y)
+    ap += "%f %f l\n" % (bot.x, bot.y)
+    ap += "%g w\n" % w
+    ap += scol + "s\nQ\n"
+    return ap
+
+def _le_slash(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for slash line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    rw = 1.1547 * max(1, w) * 1.0         # makes rect diagonal a 30 deg inclination
+    M = R if lr else L
+    r = Rect(M.x - rw, M.y - 2 * w, M.x + rw, M.y + 2 * w)
+    top = r.tl * im
+    bot = r.br * im
+    ap = "\nq\n%s%f %f m\n" % (opacity, top.x, top.y)
+    ap += "%f %f l\n" % (bot.x, bot.y)
+    ap += "%g w\n" % w
+    ap += scol + "s\nQ\n"
+    return ap
+
+def _le_openarrow(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for open arrow line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5
+    d = shift * max(1, w)
+    p2 = R + (d/2., 0) if lr else L - (d/2., 0)
+    p1 = p2 + (-2*d, -d) if lr else p2 + (2*d, -d)
+    p3 = p2 + (-2*d, d) if lr else p2 + (2*d, d)
+    p1 *= im
+    p2 *= im
+    p3 *= im
+    ap = "\nq\n%s%f %f m\n" % (opacity, p1.x, p1.y)
+    ap += "%f %f l\n" % (p2.x, p2.y)
+    ap += "%f %f l\n" % (p3.x, p3.y)
+    ap += "%g w\n" % w
+    ap += scol + "S\nQ\n"
+    return ap
+
+def _le_closedarrow(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for closed arrow line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5
+    d = shift * max(1, w)
+    p2 = R + (d/2., 0) if lr else L - (d/2., 0)
+    p1 = p2 + (-2*d, -d) if lr else p2 + (2*d, -d)
+    p3 = p2 + (-2*d, d) if lr else p2 + (2*d, d)
+    p1 *= im
+    p2 *= im
+    p3 *= im
+    ap = "\nq\n%s%f %f m\n" % (opacity, p1.x, p1.y)
+    ap += "%f %f l\n" % (p2.x, p2.y)
+    ap += "%f %f l\n" % (p3.x, p3.y)
+    ap += "%g w\n" % w
+    ap += scol + fcol + "b\nQ\n"
+    return ap
+
+def _le_ropenarrow(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for right open arrow line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5
+    d = shift * max(1, w)
+    p2 = R - (d/3., 0) if lr else L + (d/3., 0)
+    p1 = p2 + (2*d, -d) if lr else p2 + (-2*d, -d)
+    p3 = p2 + (2*d, d) if lr else p2 + (-2*d, d)
+    p1 *= im
+    p2 *= im
+    p3 *= im
+    ap = "\nq\n%s%f %f m\n" % (opacity, p1.x, p1.y)
+    ap += "%f %f l\n" % (p2.x, p2.y)
+    ap += "%f %f l\n" % (p3.x, p3.y)
+    ap += "%g w\n" % w
+    ap += scol + fcol + "S\nQ\n"
+    return ap
+
+def _le_rclosedarrow(self, annot, p1, p2, lr, fill_color):
+    """Make stream commands for right closed arrow line end symbol. "lr" denotes left (False) or right point.
+    """
+    m, im, L, R, w, scol, fcol, opacity = self._le_annot_parms(annot, p1, p2, fill_color)
+    shift = 2.5
+    d = shift * max(1, w)
+    p2 = R - (2*d, 0) if lr else L + (2*d, 0)
+    p1 = p2 + (2*d, -d) if lr else p2 + (-2*d, -d)
+    p3 = p2 + (2*d, d) if lr else p2 + (-2*d, d)
+    p1 *= im
+    p2 *= im
+    p3 *= im
+    ap = "\nq\n%s%f %f m\n" % (opacity, p1.x, p1.y)
+    ap += "%f %f l\n" % (p2.x, p2.y)
+    ap += "%f %f l\n" % (p3.x, p3.y)
+    ap += "%g w\n" % w
+    ap += scol + fcol + "b\nQ\n"
+    return ap
+        %}
+    }
+};
diff --git a/fitz/helper-annot.i b/fitz/helper-annot.i

new file mode 100644 (file)

index 0000000..69b45a9
--- /dev/null
+++ b/fitz/helper-annot.i
@@ -0,0 +1,445 @@
+%{
+//----------------------------------------------------------------------------
+// return pdf_obj "border style" from Python str
+//----------------------------------------------------------------------------
+pdf_obj *JM_get_border_style(fz_context *ctx, PyObject *style)
+{
+    pdf_obj *val = PDF_NAME(S);
+    if (!style) return val;
+    char *s = JM_Python_str_AsChar(style);
+    JM_PyErr_Clear;
+    if (!s) return val;
+    if      (!strncmp(s, "b", 1) || !strncmp(s, "B", 1)) val = PDF_NAME(B);
+    else if (!strncmp(s, "d", 1) || !strncmp(s, "D", 1)) val = PDF_NAME(D);
+    else if (!strncmp(s, "i", 1) || !strncmp(s, "I", 1)) val = PDF_NAME(I);
+    else if (!strncmp(s, "u", 1) || !strncmp(s, "U", 1)) val = PDF_NAME(U);
+    JM_Python_str_DelForPy3(s);
+    return val;
+}
+
+//----------------------------------------------------------------------------
+// Make /DA string of annotation
+//----------------------------------------------------------------------------
+const char *JM_expand_fname(const char **name)
+{
+    if (!*name) return "Helv";
+    if (!strncmp(*name, "Co", 2)) return "Cour";
+    if (!strncmp(*name, "co", 2)) return "Cour";
+    if (!strncmp(*name, "Ti", 2)) return "TiRo";
+    if (!strncmp(*name, "ti", 2)) return "TiRo";
+    if (!strncmp(*name, "Sy", 2)) return "Symb";
+    if (!strncmp(*name, "sy", 2)) return "Symb";
+    if (!strncmp(*name, "Za", 2)) return "ZaDb";
+    if (!strncmp(*name, "za", 2)) return "ZaDb";
+    return "Helv";
+}
+
+void JM_make_annot_DA(fz_context *ctx, pdf_annot *annot, int ncol, float col[4], const char *fontname, float fontsize)
+{
+    fz_buffer *buf = NULL;
+    fz_try(ctx)
+    {
+        buf = fz_new_buffer(ctx, 50);
+       if (ncol == 1)
+            fz_append_printf(ctx, buf, "%g g ", col[0]);
+        else if (ncol == 3)
+            fz_append_printf(ctx, buf, "%g %g %g rg ", col[0], col[1], col[2]);
+        else
+            fz_append_printf(ctx, buf, "%g %g %g %g k ", col[0], col[1], col[2], col[3]);
+        fz_append_printf(ctx, buf, "/%s %g Tf", JM_expand_fname(&fontname), fontsize);
+        unsigned char *da = NULL;
+        size_t len = fz_buffer_storage(ctx, buf, &da);
+        pdf_dict_put_string(ctx, annot->obj, PDF_NAME(DA), (const char *) da, len);
+    }
+    fz_always(ctx) fz_drop_buffer(ctx, buf);
+    fz_catch(ctx) fz_rethrow(ctx);
+    return;
+}
+
+//----------------------------------------------------------------------------
+// refreshes the link and annotation tables of a page
+//----------------------------------------------------------------------------
+void JM_refresh_link_table(fz_context *ctx, pdf_page *page)
+{
+    fz_try(ctx)
+    {
+        pdf_obj *annots_arr = pdf_dict_get(ctx, page->obj, PDF_NAME(Annots));
+        if (annots_arr) {
+            fz_rect page_mediabox;
+            fz_matrix page_ctm;
+            pdf_page_transform(ctx, page, &page_mediabox, &page_ctm);
+            page->links = pdf_load_link_annots(ctx, page->doc, annots_arr,
+                                            pdf_to_num(ctx, page->obj), page_ctm);
+            pdf_load_annots(ctx, page, annots_arr);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return;
+}
+
+
+PyObject *JM_annot_border(fz_context *ctx, pdf_obj *annot_obj)
+{
+    PyObject *res = PyDict_New();
+    PyObject *dash_py   = PyList_New(0);
+    PyObject *effect_py = PyList_New(0);
+    PyObject *val;
+    int i;
+    char *effect2 = NULL, *style = NULL;
+    float width = -1.0f;
+    int effect1 = -1;
+
+    pdf_obj *o = pdf_dict_get(ctx, annot_obj, PDF_NAME(Border));
+    if (pdf_is_array(ctx, o)) {
+        width = pdf_to_real(ctx, pdf_array_get(ctx, o, 2));
+        if (pdf_array_len(ctx, o) == 4) {
+            pdf_obj *dash = pdf_array_get(ctx, o, 3);
+            for (i = 0; i < pdf_array_len(ctx, dash); i++) {
+                val = Py_BuildValue("i", pdf_to_int(ctx, pdf_array_get(ctx, dash, i)));
+                LIST_APPEND_DROP(dash_py, val);
+            }
+        }
+    }
+
+    pdf_obj *bs_o = pdf_dict_get(ctx, annot_obj, PDF_NAME(BS));
+    if (bs_o)
+    {
+        o = pdf_dict_get(ctx, bs_o, PDF_NAME(W));
+        if (o) width = pdf_to_real(ctx, o);
+        o = pdf_dict_get(ctx, bs_o, PDF_NAME(S));
+        if (o) style = (char *) pdf_to_name(ctx, o);
+        o = pdf_dict_get(ctx, bs_o, PDF_NAME(D));
+        if (o) {
+            for (i = 0; i < pdf_array_len(ctx, o); i++) {
+                val = Py_BuildValue("i", pdf_to_int(ctx, pdf_array_get(ctx, o, i)));
+                LIST_APPEND_DROP(dash_py, val);
+            }
+        }
+    }
+
+    pdf_obj *be_o = pdf_dict_gets(ctx, annot_obj, "BE");
+    if (be_o) {
+        o = pdf_dict_get(ctx, be_o, PDF_NAME(S));
+        if (o) effect2 = (char *) pdf_to_name(ctx, o);
+        o = pdf_dict_get(ctx, be_o, PDF_NAME(I));
+        if (o) effect1 = pdf_to_int(ctx, o);
+    }
+
+    LIST_APPEND_DROP(effect_py, Py_BuildValue("i", effect1));
+    LIST_APPEND_DROP(effect_py, Py_BuildValue("s", effect2));
+    DICT_SETITEM_DROP(res, dictkey_width, Py_BuildValue("f", width));
+    DICT_SETITEM_DROP(res, dictkey_dashes, dash_py);
+    DICT_SETITEM_DROP(res, dictkey_style, Py_BuildValue("s", style));
+    if (effect1 > -1) PyDict_SetItem(res, dictkey_effect, effect_py);
+    Py_CLEAR(effect_py);
+    return res;
+}
+
+PyObject *JM_annot_set_border(fz_context *ctx, PyObject *border, pdf_document *doc, pdf_obj *annot_obj)
+{
+    if (!PyDict_Check(border)) {
+        JM_Warning("arg must be a dict");
+        Py_RETURN_NONE;     // not a dict
+    }
+
+    double nwidth = -1;                       // new width
+    double owidth = -1;                       // old width
+    PyObject *ndashes = NULL;                 // new dashes
+    PyObject *odashes = NULL;                 // old dashes
+    PyObject *nstyle  = NULL;                 // new style
+    PyObject *ostyle  = NULL;                 // old style
+
+    nwidth = PyFloat_AsDouble(PyDict_GetItem(border, dictkey_width));
+    ndashes = PyDict_GetItem(border, dictkey_dashes);
+    nstyle  = PyDict_GetItem(border, dictkey_style);
+
+    // first get old border properties
+    PyObject *oborder = JM_annot_border(ctx, annot_obj);
+    owidth = PyFloat_AsDouble(PyDict_GetItem(oborder, dictkey_width));
+    odashes = PyDict_GetItem(oborder, dictkey_dashes);
+    ostyle = PyDict_GetItem(oborder, dictkey_style);
+
+    // then delete any relevant entries
+    pdf_dict_del(ctx, annot_obj, PDF_NAME(BS));
+    pdf_dict_del(ctx, annot_obj, PDF_NAME(BE));
+    pdf_dict_del(ctx, annot_obj, PDF_NAME(Border));
+
+    Py_ssize_t i, n;
+    int d;
+    // populate new border array
+    if (nwidth < 0) nwidth = owidth;     // no new width: take current
+    if (nwidth < 0) nwidth = 0.0f;       // default if no width given
+    if (!ndashes) ndashes = odashes;     // no new dashes: take old
+    if (!nstyle)  nstyle  = ostyle;      // no new style: take old
+
+    if (ndashes && PySequence_Check(ndashes) && PySequence_Size(ndashes) > 0) {
+        n = PySequence_Size(ndashes);
+        pdf_obj *darr = pdf_new_array(ctx, doc, n);
+        for (i = 0; i < n; i++) {
+            d = (int) PyInt_AsLong(PySequence_ITEM(ndashes, i));
+            pdf_array_push_int(ctx, darr, (int64_t) d);
+        }
+        pdf_dict_putl_drop(ctx, annot_obj, darr, PDF_NAME(BS), PDF_NAME(D), NULL);
+        nstyle = PyUnicode_FromString("D");
+    }
+
+    pdf_dict_putl_drop(ctx, annot_obj, pdf_new_real(ctx, nwidth),
+                               PDF_NAME(BS), PDF_NAME(W), NULL);
+
+    pdf_obj *val = JM_get_border_style(ctx, nstyle);
+
+    pdf_dict_putl_drop(ctx, annot_obj, val,
+                               PDF_NAME(BS), PDF_NAME(S), NULL);
+
+    PyErr_Clear();
+    Py_RETURN_NONE;
+}
+
+PyObject *JM_annot_colors(fz_context *ctx, pdf_obj *annot_obj)
+{
+    PyObject *res = PyDict_New();
+    PyObject *bc = PyList_New(0);        // stroke colors
+    PyObject *fc = PyList_New(0);        // fill colors
+    int i;
+    float col;
+    pdf_obj *o = pdf_dict_get(ctx, annot_obj, PDF_NAME(C));
+    if (pdf_is_array(ctx, o)) {
+        int n = pdf_array_len(ctx, o);
+        for (i = 0; i < n; i++) {
+            col = pdf_to_real(ctx, pdf_array_get(ctx, o, i));
+            LIST_APPEND_DROP(bc, Py_BuildValue("f", col));
+        }
+    }
+    DICT_SETITEM_DROP(res, dictkey_stroke, bc);
+
+    o = pdf_dict_gets(ctx, annot_obj, "IC");
+    if (pdf_is_array(ctx, o)) {
+        int n = pdf_array_len(ctx, o);
+        for (i = 0; i < n; i++) {
+            col = pdf_to_real(ctx, pdf_array_get(ctx, o, i));
+            LIST_APPEND_DROP(fc, Py_BuildValue("f", col));
+        }
+    }
+    DICT_SETITEM_DROP(res, dictkey_fill, fc);
+
+    return res;
+}
+
+//----------------------------------------------------------------------------
+// delete an annotation using mupdf functions, but first delete the /AP and
+// /Popup dict keys in annot->obj. Also remove the 'Popup' annotation
+// from the page's /Annots array which may also exist.
+//----------------------------------------------------------------------------
+void JM_delete_annot(fz_context *ctx, pdf_page *page, pdf_annot *annot)
+{
+    if (!annot) return;
+    fz_try(ctx) {
+        // first get any existing popup for the annotation
+        pdf_obj *popup = pdf_dict_get(ctx, annot->obj, PDF_NAME(Popup));
+
+
+        // next delete the /Popup and /AP entries from annot dictionary
+        pdf_dict_del(ctx, annot->obj, PDF_NAME(Popup));
+        pdf_dict_del(ctx, annot->obj, PDF_NAME(AP));
+
+        // if there exists a /Popup, find and destroy it. The right popup
+        // has a /Parent entry which points to our annotation.
+
+        pdf_obj *annots = pdf_dict_get(ctx, page->obj, PDF_NAME(Annots));
+        int i, n = pdf_array_len(ctx, annots);
+        for (i = n - 1; i >= 0; i--) {
+            pdf_obj *o = pdf_array_get(ctx, annots, i);
+            pdf_obj *p = pdf_dict_get(ctx, o, PDF_NAME(Parent));
+            if (!p) continue;
+            if (!pdf_objcmp(ctx, p, annot->obj)) {
+                pdf_array_delete(ctx, annots, i);
+            }
+        }
+
+        pdf_delete_annot(ctx, page, annot);
+    }
+    fz_catch(ctx) {
+        fz_warn(ctx, "could not delete annotation");
+    }
+    return;
+}
+
+//----------------------------------------------------------------------------
+// Return the first annotation whose /IRT key ("In Response To") points to
+// annot. Used to remove the response chain of a given annotation.
+//----------------------------------------------------------------------------
+pdf_annot *JM_find_annot_irt(fz_context *ctx, pdf_annot *annot)
+{
+    pdf_annot *irt_annot = NULL;  // returning this
+    pdf_obj *o = NULL;
+    pdf_annot **annotptr;
+    int found = 0;
+    fz_try(ctx) {   // loop thru MuPDF's internal annots array
+        pdf_page *page = annot->page;
+        for (annotptr = &page->annots; *annotptr; annotptr = &(*annotptr)->next) {
+            irt_annot = *annotptr;  // check if this is what we are looking for
+            o = pdf_dict_gets(ctx, irt_annot->obj, "IRT");
+            if (o) {
+                if (!pdf_objcmp(ctx, o, annot->obj)) {
+                    found = 1;
+                    break;
+                }
+            }
+        }
+    }
+    fz_catch(ctx) {;}
+    if (found) return irt_annot;
+    return NULL;
+}
+
+//----------------------------------------------------------------------------
+// return the identifications of a page's annotations (list of /NM entries)
+//----------------------------------------------------------------------------
+PyObject *JM_get_annot_id_list(fz_context *ctx, pdf_page *page)
+{
+    PyObject *names = PyList_New(0);
+    pdf_obj *annot_obj = NULL;
+    pdf_obj *annots = pdf_dict_get(ctx, page->obj, PDF_NAME(Annots));
+    pdf_obj *name = NULL;
+    if (!annots) return names;
+    fz_try(ctx) {
+        int i, n = pdf_array_len(ctx, annots);
+        for (i = 0; i < n; i++) {
+            annot_obj = pdf_array_get(ctx, annots, i);
+            name = pdf_dict_gets(ctx, annot_obj, "NM");
+            if (name) {
+                LIST_APPEND_DROP(names, Py_BuildValue("s", pdf_to_text_string(ctx, name)));
+            }
+        }
+    }
+    fz_catch(ctx) {
+        return names;
+    }
+    return names;
+}
+
+
+//----------------------------------------------------------------------------
+// return the xref numbers of a page's annots, links and fields
+//----------------------------------------------------------------------------
+PyObject *JM_get_annot_xref_list(fz_context *ctx, pdf_page *page)
+{
+    PyObject *names = PyList_New(0);
+    pdf_obj *annot_obj = NULL;
+    pdf_obj *annots = pdf_dict_get(ctx, page->obj, PDF_NAME(Annots));
+    pdf_obj *name = NULL;
+    if (!annots) return names;
+    fz_try(ctx) {
+        int i, n = pdf_array_len(ctx, annots);
+        for (i = 0; i < n; i++) {
+            annot_obj = pdf_array_get(ctx, annots, i);
+            int xref = pdf_to_num(ctx, annot_obj);
+            pdf_obj *subtype = pdf_dict_get(ctx, annot_obj, PDF_NAME(Subtype));
+            int type = PDF_ANNOT_UNKNOWN;
+            if (subtype) {
+                const char *name = pdf_to_name(ctx, subtype);
+                type = pdf_annot_type_from_string(ctx, name);
+            }
+            LIST_APPEND_DROP(names, Py_BuildValue("ii", xref, type));
+        }
+    }
+    fz_catch(ctx) {
+        return names;
+    }
+    return names;
+}
+
+
+//----------------------------------------------------------------------------
+// Add a unique /NM key to an annotation or widget.
+// Append a number to 'stem' such that the result is a unique name.
+//----------------------------------------------------------------------------
+void JM_add_annot_id(fz_context *ctx, pdf_annot *annot, char *stem)
+{
+    fz_try(ctx) {
+        PyObject *names = NULL;
+        names = JM_get_annot_id_list(ctx, annot->page);
+
+        int i = 0;
+        PyObject *stem_id = NULL;
+        while (1) {
+            stem_id = PyUnicode_FromFormat("%s-%d", stem, i);
+            if (!PySequence_Contains(names, stem_id)) break;
+            i += 1;
+            Py_DECREF(stem_id);
+        }
+        char *response = JM_Python_str_AsChar(stem_id);
+        pdf_obj *name = pdf_new_string(ctx, (const char *) response, strlen(response));
+        pdf_dict_puts_drop(ctx, annot->obj, "NM", name);
+        JM_Python_str_DelForPy3(response);
+        Py_CLEAR(stem_id);
+        Py_CLEAR(names);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+}
+
+//----------------------------------------------------------------------------
+// retrieve annot by name (/NM key)
+//----------------------------------------------------------------------------
+pdf_annot *JM_get_annot_by_name(fz_context *ctx, pdf_page *page, char *name)
+{
+    if (!name || strlen(name) == 0) {
+        return NULL;
+    }
+    pdf_annot **annotptr = NULL;
+    pdf_annot *annot = NULL;
+    int found = 0;
+    size_t len = 0;
+
+    fz_try(ctx) {   // loop thru MuPDF's internal annots and widget arrays
+        for (annotptr = &page->annots; *annotptr; annotptr = &(*annotptr)->next) {
+            annot = *annotptr;
+            const char *response = pdf_to_string(ctx, pdf_dict_gets(ctx, annot->obj, "NM"), &len);
+            if (strcmp(name, response) == 0) {
+                found = 1;
+                break;
+            }
+        }
+        if (!found) {
+            fz_throw(ctx, FZ_ERROR_GENERIC, "'%s' is not an annot of this page", name);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return pdf_keep_annot(ctx, annot);
+}
+
+//----------------------------------------------------------------------------
+// retrieve annot by its xref
+//----------------------------------------------------------------------------
+pdf_annot *JM_get_annot_by_xref(fz_context *ctx, pdf_page *page, int xref)
+{
+    pdf_annot **annotptr = NULL;
+    pdf_annot *annot = NULL;
+    int found = 0;
+    size_t len = 0;
+
+    fz_try(ctx) {   // loop thru MuPDF's internal annots array
+        for (annotptr = &page->annots; *annotptr; annotptr = &(*annotptr)->next) {
+            annot = *annotptr;
+            if (xref == pdf_to_num(ctx, annot->obj)) {
+                found = 1;
+                break;
+            }
+        }
+        if (!found) {
+            fz_throw(ctx, FZ_ERROR_GENERIC, "xref %d is not an annot of this page", xref);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return pdf_keep_annot(ctx, annot);
+}
+
+%}
diff --git a/fitz/helper-convert.i b/fitz/helper-convert.i

new file mode 100644 (file)

index 0000000..ea1d23e
--- /dev/null
+++ b/fitz/helper-convert.i
@@ -0,0 +1,84 @@
+%{
+//-----------------------------------------------------------------------------
+// Convert any MuPDF document to a PDF
+// Returns bytes object containing the PDF, created via 'write' function.
+//-----------------------------------------------------------------------------
+PyObject *JM_convert_to_pdf(fz_context *ctx, fz_document *doc, int fp, int tp, int rotate)
+{
+    pdf_document *pdfout = pdf_create_document(ctx);  // new PDF document
+    int i, incr = 1, s = fp, e = tp;
+    if (fp > tp) {
+        incr = -1;           // count backwards
+        s = tp;              // adjust ...
+        e = fp;              // ... range
+    }
+    fz_rect mediabox;
+    int rot = JM_norm_rotation(rotate);
+    fz_device *dev = NULL;
+    fz_buffer *contents = NULL;
+    pdf_obj *resources = NULL;
+    fz_page *page;
+    fz_var(dev);
+    fz_var(contents);
+    fz_var(resources);
+    fz_var(page);
+    for (i = fp; INRANGE(i, s, e); i += incr) {  // interpret & write document pages as PDF pages
+        fz_try(ctx) {
+            page = fz_load_page(ctx, doc, i);
+            mediabox = fz_bound_page(ctx, page);
+            dev = pdf_page_write(ctx, pdfout, mediabox, &resources, &contents);
+            fz_run_page(ctx, page, dev, fz_identity, NULL);
+            fz_close_device(ctx, dev);
+            fz_drop_device(ctx, dev);
+            dev = NULL;
+            pdf_obj *page_obj = pdf_add_page(ctx, pdfout, mediabox, rot, resources, contents);
+            pdf_insert_page(ctx, pdfout, -1, page_obj);
+            pdf_drop_obj(ctx, page_obj);
+        }
+        fz_always(ctx) {
+            pdf_drop_obj(ctx, resources);
+            fz_drop_buffer(ctx, contents);
+            fz_drop_device(ctx, dev);
+            fz_drop_page(ctx, page);
+        }
+        fz_catch(ctx) {
+            fz_rethrow(ctx);
+        }
+    }
+    // PDF created - now write it to Python bytearray
+    PyObject *r = NULL;
+    fz_output *out = NULL;
+    fz_buffer *res = NULL;
+    // prepare write options structure
+    pdf_write_options opts = { 0 };
+    opts.do_garbage         = 4;
+    opts.do_compress        = 1;
+    opts.do_compress_images = 1;
+    opts.do_compress_fonts  = 1;
+    opts.do_sanitize        = 1;
+    opts.do_incremental     = 0;
+    opts.do_ascii           = 0;
+    opts.do_decompress      = 0;
+    opts.do_linear          = 0;
+    opts.do_clean           = 1;
+    opts.do_pretty          = 0;
+
+    fz_try(ctx) {
+        res = fz_new_buffer(ctx, 8192);
+        out = fz_new_output_with_buffer(ctx, res);
+        pdf_write_document(ctx, pdfout, out, &opts);
+        unsigned char *c = NULL;
+        size_t len = fz_buffer_storage(gctx, res, &c);
+        r = PyBytes_FromStringAndSize((const char *) c, (Py_ssize_t) len);
+    }
+    fz_always(ctx) {
+        pdf_drop_document(ctx, pdfout);
+        fz_drop_output(ctx, out);
+        fz_drop_buffer(ctx, res);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return r;
+}
+%}
diff --git a/fitz/helper-defines.i b/fitz/helper-defines.i

new file mode 100644 (file)

index 0000000..629556b
--- /dev/null
+++ b/fitz/helper-defines.i
@@ -0,0 +1,400 @@
+%inline %{
+//----------------------------------------------------------------------------
+// general
+//----------------------------------------------------------------------------
+#define EPSILON 1e-5
+
+//----------------------------------------------------------------------------
+// annotation types
+//----------------------------------------------------------------------------
+#define PDF_ANNOT_TEXT 0
+#define PDF_ANNOT_LINK 1
+#define PDF_ANNOT_FREE_TEXT 2
+#define PDF_ANNOT_LINE 3
+#define PDF_ANNOT_SQUARE 4
+#define PDF_ANNOT_CIRCLE 5
+#define PDF_ANNOT_POLYGON 6
+#define PDF_ANNOT_POLY_LINE 7
+#define PDF_ANNOT_HIGHLIGHT 8
+#define PDF_ANNOT_UNDERLINE 9
+#define PDF_ANNOT_SQUIGGLY 10
+#define PDF_ANNOT_STRIKE_OUT 11
+#define PDF_ANNOT_REDACT 12
+#define PDF_ANNOT_STAMP 13
+#define PDF_ANNOT_CARET 14
+#define PDF_ANNOT_INK 15
+#define PDF_ANNOT_POPUP 16
+#define PDF_ANNOT_FILE_ATTACHMENT 17
+#define PDF_ANNOT_SOUND 18
+#define PDF_ANNOT_MOVIE 19
+#define PDF_ANNOT_WIDGET 20
+#define PDF_ANNOT_SCREEN 21
+#define PDF_ANNOT_PRINTER_MARK 22
+#define PDF_ANNOT_TRAP_NET 23
+#define PDF_ANNOT_WATERMARK 24
+#define PDF_ANNOT_3D 25
+#define PDF_ANNOT_UNKNOWN -1
+
+
+//----------------------------------------------------------------------------
+// annotation flag bits
+//----------------------------------------------------------------------------
+#define PDF_ANNOT_IS_INVISIBLE 1 << (1-1)
+#define PDF_ANNOT_IS_HIDDEN 1 << (2-1)
+#define PDF_ANNOT_IS_PRINT 1 << (3-1)
+#define PDF_ANNOT_IS_NO_ZOOM 1 << (4-1)
+#define PDF_ANNOT_IS_NO_ROTATE 1 << (5-1)
+#define PDF_ANNOT_IS_NO_VIEW 1 << (6-1)
+#define PDF_ANNOT_IS_READ_ONLY 1 << (7-1)
+#define PDF_ANNOT_IS_LOCKED 1 << (8-1)
+#define PDF_ANNOT_IS_TOGGLE_NO_VIEW 1 << (9-1)
+#define PDF_ANNOT_IS_LOCKED_CONTENTS 1 << (10-1)
+
+
+//----------------------------------------------------------------------------
+// annotation line ending styles
+//----------------------------------------------------------------------------
+#define PDF_ANNOT_LE_NONE 0
+#define PDF_ANNOT_LE_SQUARE 1
+#define PDF_ANNOT_LE_CIRCLE 2
+#define PDF_ANNOT_LE_DIAMOND 3
+#define PDF_ANNOT_LE_OPEN_ARROW 4
+#define PDF_ANNOT_LE_CLOSED_ARROW 5
+#define PDF_ANNOT_LE_BUTT 6
+#define PDF_ANNOT_LE_R_OPEN_ARROW 7
+#define PDF_ANNOT_LE_R_CLOSED_ARROW 8
+#define PDF_ANNOT_LE_SLASH 9
+
+
+//----------------------------------------------------------------------------
+// annotation field (widget) types
+//----------------------------------------------------------------------------
+#define PDF_WIDGET_TYPE_UNKNOWN 0
+#define PDF_WIDGET_TYPE_BUTTON 1
+#define PDF_WIDGET_TYPE_CHECKBOX 2
+#define PDF_WIDGET_TYPE_COMBOBOX 3
+#define PDF_WIDGET_TYPE_LISTBOX 4
+#define PDF_WIDGET_TYPE_RADIOBUTTON 5
+#define PDF_WIDGET_TYPE_SIGNATURE 6
+#define PDF_WIDGET_TYPE_TEXT 7
+
+
+//----------------------------------------------------------------------------
+// annotation text widget subtypes
+//----------------------------------------------------------------------------
+#define PDF_WIDGET_TX_FORMAT_NONE 0
+#define PDF_WIDGET_TX_FORMAT_NUMBER 1
+#define PDF_WIDGET_TX_FORMAT_SPECIAL 2
+#define PDF_WIDGET_TX_FORMAT_DATE 3
+#define PDF_WIDGET_TX_FORMAT_TIME 4
+
+
+//----------------------------------------------------------------------------
+// annotation widget flags
+//----------------------------------------------------------------------------
+// Common to all field types
+#define PDF_FIELD_IS_READ_ONLY 1
+#define PDF_FIELD_IS_REQUIRED 1 << 1
+#define PDF_FIELD_IS_NO_EXPORT 1 << 2
+
+
+// Text fields
+#define PDF_TX_FIELD_IS_MULTILINE  1 << 12
+#define PDF_TX_FIELD_IS_PASSWORD  1 << 13
+#define PDF_TX_FIELD_IS_FILE_SELECT  1 << 20
+#define PDF_TX_FIELD_IS_DO_NOT_SPELL_CHECK  1 << 22
+#define PDF_TX_FIELD_IS_DO_NOT_SCROLL  1 << 23
+#define PDF_TX_FIELD_IS_COMB  1 << 24
+#define PDF_TX_FIELD_IS_RICH_TEXT  1 << 25
+
+
+// Button fields
+#define PDF_BTN_FIELD_IS_NO_TOGGLE_TO_OFF  1 << 14
+#define PDF_BTN_FIELD_IS_RADIO  1 << 15
+#define PDF_BTN_FIELD_IS_PUSHBUTTON  1 << 16
+#define PDF_BTN_FIELD_IS_RADIOS_IN_UNISON  1 << 25
+
+
+// Choice fields
+#define PDF_CH_FIELD_IS_COMBO  1 << 17
+#define PDF_CH_FIELD_IS_EDIT  1 << 18
+#define PDF_CH_FIELD_IS_SORT  1 << 19
+#define PDF_CH_FIELD_IS_MULTI_SELECT  1 << 21
+#define PDF_CH_FIELD_IS_DO_NOT_SPELL_CHECK  1 << 22
+#define PDF_CH_FIELD_IS_COMMIT_ON_SEL_CHANGE  1 << 26
+
+
+// Signature fields errors
+#define PDF_SIGNATURE_ERROR_OKAY 0
+#define PDF_SIGNATURE_ERROR_NO_SIGNATURES 1
+#define PDF_SIGNATURE_ERROR_NO_CERTIFICATE 2
+#define PDF_SIGNATURE_ERROR_DIGEST_FAILURE 3
+#define PDF_SIGNATURE_ERROR_SELF_SIGNED 4
+#define PDF_SIGNATURE_ERROR_SELF_SIGNED_IN_CHAIN 5
+#define PDF_SIGNATURE_ERROR_NOT_TRUSTED 6
+#define PDF_SIGNATURE_ERROR_UNKNOWN 7
+
+
+//----------------------------------------------------------------------------
+// colorspace identifiers
+//----------------------------------------------------------------------------
+#define CS_RGB  1
+#define CS_GRAY 2
+#define CS_CMYK 3
+
+//----------------------------------------------------------------------------
+// PDF encryption algorithms
+//----------------------------------------------------------------------------
+#define PDF_ENCRYPT_KEEP 0
+#define PDF_ENCRYPT_NONE 1
+#define PDF_ENCRYPT_RC4_40 2
+#define PDF_ENCRYPT_RC4_128 3
+#define PDF_ENCRYPT_AES_128 4
+#define PDF_ENCRYPT_AES_256 5
+#define PDF_ENCRYPT_UNKNOWN 6
+
+//----------------------------------------------------------------------------
+// PDF permission codes
+//----------------------------------------------------------------------------
+#define PDF_PERM_PRINT 1 << 2
+#define PDF_PERM_MODIFY 1 << 3
+#define PDF_PERM_COPY 1 << 4
+#define PDF_PERM_ANNOTATE 1 << 5
+#define PDF_PERM_FORM 1 << 8
+#define PDF_PERM_ACCESSIBILITY 1 << 9
+#define PDF_PERM_ASSEMBLE 1 << 10
+#define PDF_PERM_PRINT_HQ 1 << 11
+
+//----------------------------------------------------------------------------
+// PDF Blend Modes
+//----------------------------------------------------------------------------
+#define PDF_BM_Color "Color"
+#define PDF_BM_ColorBurn "ColorBurn"
+#define PDF_BM_ColorDodge "ColorDodge"
+#define PDF_BM_Darken "Darken"
+#define PDF_BM_Difference "Difference"
+#define PDF_BM_Exclusion "Exclusion"
+#define PDF_BM_HardLight "HardLight"
+#define PDF_BM_Hue "Hue"
+#define PDF_BM_Lighten "Lighten"
+#define PDF_BM_Luminosity "Luminosity"
+#define PDF_BM_Multiply "Multiply"
+#define PDF_BM_Normal "Normal"
+#define PDF_BM_Overlay "Overlay"
+#define PDF_BM_Saturation "Saturation"
+#define PDF_BM_Screen "Screen"
+#define PDF_BM_SoftLight "Softlight"
+
+
+// General text flags
+#define TEXT_FONT_SUPERSCRIPT 1
+#define TEXT_FONT_ITALIC 2
+#define TEXT_FONT_SERIFED 4
+#define TEXT_FONT_MONOSPACED 8
+#define TEXT_FONT_BOLD 16
+
+// UCDN Script codes
+#define UCDN_SCRIPT_COMMON 0
+#define UCDN_SCRIPT_LATIN 1
+#define UCDN_SCRIPT_GREEK 2
+#define UCDN_SCRIPT_CYRILLIC 3
+#define UCDN_SCRIPT_ARMENIAN 4
+#define UCDN_SCRIPT_HEBREW 5
+#define UCDN_SCRIPT_ARABIC 6
+#define UCDN_SCRIPT_SYRIAC 7
+#define UCDN_SCRIPT_THAANA 8
+#define UCDN_SCRIPT_DEVANAGARI 9
+#define UCDN_SCRIPT_BENGALI 10
+#define UCDN_SCRIPT_GURMUKHI 11
+#define UCDN_SCRIPT_GUJARATI 12
+#define UCDN_SCRIPT_ORIYA 13
+#define UCDN_SCRIPT_TAMIL 14
+#define UCDN_SCRIPT_TELUGU 15
+#define UCDN_SCRIPT_KANNADA 16
+#define UCDN_SCRIPT_MALAYALAM 17
+#define UCDN_SCRIPT_SINHALA 18
+#define UCDN_SCRIPT_THAI 19
+#define UCDN_SCRIPT_LAO 20
+#define UCDN_SCRIPT_TIBETAN 21
+#define UCDN_SCRIPT_MYANMAR 22
+#define UCDN_SCRIPT_GEORGIAN 23
+#define UCDN_SCRIPT_HANGUL 24
+#define UCDN_SCRIPT_ETHIOPIC 25
+#define UCDN_SCRIPT_CHEROKEE 26
+#define UCDN_SCRIPT_CANADIAN_ABORIGINAL 27
+#define UCDN_SCRIPT_OGHAM 28
+#define UCDN_SCRIPT_RUNIC 29
+#define UCDN_SCRIPT_KHMER 30
+#define UCDN_SCRIPT_MONGOLIAN 31
+#define UCDN_SCRIPT_HIRAGANA 32
+#define UCDN_SCRIPT_KATAKANA 33
+#define UCDN_SCRIPT_BOPOMOFO 34
+#define UCDN_SCRIPT_HAN 35
+#define UCDN_SCRIPT_YI 36
+#define UCDN_SCRIPT_OLD_ITALIC 37
+#define UCDN_SCRIPT_GOTHIC 38
+#define UCDN_SCRIPT_DESERET 39
+#define UCDN_SCRIPT_INHERITED 40
+#define UCDN_SCRIPT_TAGALOG 41
+#define UCDN_SCRIPT_HANUNOO 42
+#define UCDN_SCRIPT_BUHID 43
+#define UCDN_SCRIPT_TAGBANWA 44
+#define UCDN_SCRIPT_LIMBU 45
+#define UCDN_SCRIPT_TAI_LE 46
+#define UCDN_SCRIPT_LINEAR_B 47
+#define UCDN_SCRIPT_UGARITIC 48
+#define UCDN_SCRIPT_SHAVIAN 49
+#define UCDN_SCRIPT_OSMANYA 50
+#define UCDN_SCRIPT_CYPRIOT 51
+#define UCDN_SCRIPT_BRAILLE 52
+#define UCDN_SCRIPT_BUGINESE 53
+#define UCDN_SCRIPT_COPTIC 54
+#define UCDN_SCRIPT_NEW_TAI_LUE 55
+#define UCDN_SCRIPT_GLAGOLITIC 56
+#define UCDN_SCRIPT_TIFINAGH 57
+#define UCDN_SCRIPT_SYLOTI_NAGRI 58
+#define UCDN_SCRIPT_OLD_PERSIAN 59
+#define UCDN_SCRIPT_KHAROSHTHI 60
+#define UCDN_SCRIPT_BALINESE 61
+#define UCDN_SCRIPT_CUNEIFORM 62
+#define UCDN_SCRIPT_PHOENICIAN 63
+#define UCDN_SCRIPT_PHAGS_PA 64
+#define UCDN_SCRIPT_NKO 65
+#define UCDN_SCRIPT_SUNDANESE 66
+#define UCDN_SCRIPT_LEPCHA 67
+#define UCDN_SCRIPT_OL_CHIKI 68
+#define UCDN_SCRIPT_VAI 69
+#define UCDN_SCRIPT_SAURASHTRA 70
+#define UCDN_SCRIPT_KAYAH_LI 71
+#define UCDN_SCRIPT_REJANG 72
+#define UCDN_SCRIPT_LYCIAN 73
+#define UCDN_SCRIPT_CARIAN 74
+#define UCDN_SCRIPT_LYDIAN 75
+#define UCDN_SCRIPT_CHAM 76
+#define UCDN_SCRIPT_TAI_THAM 77
+#define UCDN_SCRIPT_TAI_VIET 78
+#define UCDN_SCRIPT_AVESTAN 79
+#define UCDN_SCRIPT_EGYPTIAN_HIEROGLYPHS 80
+#define UCDN_SCRIPT_SAMARITAN 81
+#define UCDN_SCRIPT_LISU 82
+#define UCDN_SCRIPT_BAMUM 83
+#define UCDN_SCRIPT_JAVANESE 84
+#define UCDN_SCRIPT_MEETEI_MAYEK 85
+#define UCDN_SCRIPT_IMPERIAL_ARAMAIC 86
+#define UCDN_SCRIPT_OLD_SOUTH_ARABIAN 87
+#define UCDN_SCRIPT_INSCRIPTIONAL_PARTHIAN 88
+#define UCDN_SCRIPT_INSCRIPTIONAL_PAHLAVI 89
+#define UCDN_SCRIPT_OLD_TURKIC 90
+#define UCDN_SCRIPT_KAITHI 91
+#define UCDN_SCRIPT_BATAK 92
+#define UCDN_SCRIPT_BRAHMI 93
+#define UCDN_SCRIPT_MANDAIC 94
+#define UCDN_SCRIPT_CHAKMA 95
+#define UCDN_SCRIPT_MEROITIC_CURSIVE 96
+#define UCDN_SCRIPT_MEROITIC_HIEROGLYPHS 97
+#define UCDN_SCRIPT_MIAO 98
+#define UCDN_SCRIPT_SHARADA 99
+#define UCDN_SCRIPT_SORA_SOMPENG 100
+#define UCDN_SCRIPT_TAKRI 101
+#define UCDN_SCRIPT_UNKNOWN 102
+#define UCDN_SCRIPT_BASSA_VAH 103
+#define UCDN_SCRIPT_CAUCASIAN_ALBANIAN 104
+#define UCDN_SCRIPT_DUPLOYAN 105
+#define UCDN_SCRIPT_ELBASAN 106
+#define UCDN_SCRIPT_GRANTHA 107
+#define UCDN_SCRIPT_KHOJKI 108
+#define UCDN_SCRIPT_KHUDAWADI 109
+#define UCDN_SCRIPT_LINEAR_A 110
+#define UCDN_SCRIPT_MAHAJANI 111
+#define UCDN_SCRIPT_MANICHAEAN 112
+#define UCDN_SCRIPT_MENDE_KIKAKUI 113
+#define UCDN_SCRIPT_MODI 114
+#define UCDN_SCRIPT_MRO 115
+#define UCDN_SCRIPT_NABATAEAN 116
+#define UCDN_SCRIPT_OLD_NORTH_ARABIAN 117
+#define UCDN_SCRIPT_OLD_PERMIC 118
+#define UCDN_SCRIPT_PAHAWH_HMONG 119
+#define UCDN_SCRIPT_PALMYRENE 120
+#define UCDN_SCRIPT_PAU_CIN_HAU 121
+#define UCDN_SCRIPT_PSALTER_PAHLAVI 122
+#define UCDN_SCRIPT_SIDDHAM 123
+#define UCDN_SCRIPT_TIRHUTA 124
+#define UCDN_SCRIPT_WARANG_CITI 125
+#define UCDN_SCRIPT_AHOM 126
+#define UCDN_SCRIPT_ANATOLIAN_HIEROGLYPHS 127
+#define UCDN_SCRIPT_HATRAN 128
+#define UCDN_SCRIPT_MULTANI 129
+#define UCDN_SCRIPT_OLD_HUNGARIAN 130
+#define UCDN_SCRIPT_SIGNWRITING 131
+#define UCDN_SCRIPT_ADLAM 132
+#define UCDN_SCRIPT_BHAIKSUKI 133
+#define UCDN_SCRIPT_MARCHEN 134
+#define UCDN_SCRIPT_NEWA 135
+#define UCDN_SCRIPT_OSAGE 136
+#define UCDN_SCRIPT_TANGUT 137
+#define UCDN_SCRIPT_MASARAM_GONDI 138
+#define UCDN_SCRIPT_NUSHU 139
+#define UCDN_SCRIPT_SOYOMBO 140
+#define UCDN_SCRIPT_ZANABAZAR_SQUARE 141
+#define UCDN_SCRIPT_DOGRA 142
+#define UCDN_SCRIPT_GUNJALA_GONDI 143
+#define UCDN_SCRIPT_HANIFI_ROHINGYA 144
+#define UCDN_SCRIPT_MAKASAR 145
+#define UCDN_SCRIPT_MEDEFAIDRIN 146
+#define UCDN_SCRIPT_OLD_SOGDIAN 147
+#define UCDN_SCRIPT_SOGDIAN 148
+#define UCDN_SCRIPT_ELYMAIC 149
+#define UCDN_SCRIPT_NANDINAGARI 150
+#define UCDN_SCRIPT_NYIAKENG_PUACHUE_HMONG 151
+#define UCDN_SCRIPT_WANCHO 152
+
+%}
+
+%{
+// Global Constants - Python dictionary keys
+PyObject *dictkey_align;
+PyObject *dictkey_bbox;
+PyObject *dictkey_blocks;
+PyObject *dictkey_bpc;
+PyObject *dictkey_c;
+PyObject *dictkey_chars;
+PyObject *dictkey_color;
+PyObject *dictkey_colorspace;
+PyObject *dictkey_content;
+PyObject *dictkey_creationDate;
+PyObject *dictkey_cs_name;
+PyObject *dictkey_da;
+PyObject *dictkey_dashes;
+PyObject *dictkey_desc;
+PyObject *dictkey_dir;
+PyObject *dictkey_effect;
+PyObject *dictkey_ext;
+PyObject *dictkey_filename;
+PyObject *dictkey_fill;
+PyObject *dictkey_flags;
+PyObject *dictkey_font;
+PyObject *dictkey_height;
+PyObject *dictkey_id;
+PyObject *dictkey_image;
+PyObject *dictkey_length;
+PyObject *dictkey_lines;
+PyObject *dictkey_modDate;
+PyObject *dictkey_name;
+PyObject *dictkey_origin;
+PyObject *dictkey_size;
+PyObject *dictkey_smask;
+PyObject *dictkey_spans;
+PyObject *dictkey_stroke;
+PyObject *dictkey_style;
+PyObject *dictkey_subject;
+PyObject *dictkey_text;
+PyObject *dictkey_title;
+PyObject *dictkey_type;
+PyObject *dictkey_ufilename;
+PyObject *dictkey_width;
+PyObject *dictkey_wmode;
+PyObject *dictkey_xref;
+PyObject *dictkey_xres;
+PyObject *dictkey_yres;
+
+%}
diff --git a/fitz/helper-fields.i b/fitz/helper-fields.i

new file mode 100644 (file)

index 0000000..3a68ec4
--- /dev/null
+++ b/fitz/helper-fields.i
@@ -0,0 +1,977 @@
+%{
+#define SETATTR(a, v) PyObject_SetAttrString(Widget, a, v)
+#define GETATTR(a) PyObject_GetAttrString(Widget, a)
+#define CALLATTR(m, p) PyObject_CallMethod(Widget, m, p)
+
+static void
+SETATTR_DROP(PyObject *mod, const char *attr, PyObject *value)
+{
+    if (!value)
+        PyObject_DelAttrString(mod, attr);
+    else
+    {
+        PyObject_SetAttrString(mod, attr, value);
+        Py_DECREF(value);
+    }
+}
+
+//-----------------------------------------------------------------------------
+// Functions dealing with PDF form fields (widgets)
+//-----------------------------------------------------------------------------
+enum
+{
+       SigFlag_SignaturesExist = 1,
+       SigFlag_AppendOnly = 2
+};
+
+
+// make new PDF action object from JavaScript source
+// Parameters are a PDF document and a Python string.
+// Returns a PDF action object.
+//-----------------------------------------------------------------------------
+pdf_obj *
+JM_new_javascript(fz_context *ctx, pdf_document *pdf, PyObject *value)
+{
+    fz_buffer *res = NULL;
+    if (!PyObject_IsTrue(value))  // no argument given
+        return NULL;
+
+    char *data = JM_Python_str_AsChar(value);
+    if (!data)  // not convertible to char*
+        return NULL;
+
+    res = fz_new_buffer_from_copied_data(ctx, data, strlen(data));
+    pdf_obj *source = pdf_add_stream(ctx, pdf, res, NULL, 0);
+    pdf_obj *newaction = pdf_add_new_dict(ctx, pdf, 4);
+    pdf_dict_put(ctx, newaction, PDF_NAME(S), pdf_new_name(ctx, "JavaScript"));
+    pdf_dict_put(ctx, newaction, PDF_NAME(JS), source);
+    JM_Python_str_DelForPy3(data);
+    fz_drop_buffer(ctx, res);
+    return pdf_keep_obj(ctx, newaction);
+}
+
+
+// JavaScript extractor
+// Returns either the script source or None. Parameter is a PDF action
+// dictionary, which must have keys /S and /JS. The value of /S must be
+// '/JavaScript'. The value of /JS is returned.
+//-----------------------------------------------------------------------------
+PyObject *
+JM_get_script(fz_context *ctx, pdf_obj *key)
+{
+    pdf_obj *js = NULL;
+    fz_buffer *res = NULL;
+    PyObject *script = NULL;
+    if (!key) Py_RETURN_NONE;
+
+    if (!strcmp(pdf_to_name(ctx,
+                pdf_dict_get(ctx, key, PDF_NAME(S))), "JavaScript")) {
+        js = pdf_dict_get(ctx, key, PDF_NAME(JS));
+    }
+    if (!js) Py_RETURN_NONE;
+
+    if (pdf_is_string(ctx, js)) {
+        script = JM_UnicodeFromStr(pdf_to_text_string(ctx, js));
+    } else if (pdf_is_stream(ctx, js)) {
+        res = pdf_load_stream(ctx, js);
+        script = JM_EscapeStrFromBuffer(ctx, res);
+        fz_drop_buffer(ctx, res);
+    } else {
+        Py_RETURN_NONE;
+    }
+    if (PyObject_IsTrue(script)) { // do not return an empty script
+        return script;
+    }
+    Py_CLEAR(script);
+    Py_RETURN_NONE;
+}
+
+
+// Create a JavaScript PDF action.
+// Usable for all object types which support PDF actions, even if the
+// argument name suggests annotations. Up to 2 key values can be specified, so
+// JavaScript actions can be stored for '/A' and '/AA/?' keys.
+//-----------------------------------------------------------------------------
+void JM_put_script(fz_context *ctx, pdf_obj *annot_obj, pdf_obj *key1, pdf_obj *key2, PyObject *value)
+{
+    PyObject *script = NULL;
+    pdf_obj *key1_obj = pdf_dict_get(ctx, annot_obj, key1);
+    pdf_document *pdf = pdf_get_bound_document(ctx, annot_obj);  // owning PDF
+
+    // if no new script given, just delete corresponding key
+    if (!value || !PyObject_IsTrue(value)) {
+        if (!key2) {
+            pdf_dict_del(ctx, annot_obj, key1);
+        } else if (key1_obj) {
+            pdf_dict_del(ctx, key1_obj, key2);
+        }
+        return;
+    }
+
+    // read any existing script as a PyUnicode string
+    if (!key2 || !key1_obj) {
+        script = JM_get_script(ctx, key1_obj);
+    } else {
+        script = JM_get_script(ctx, pdf_dict_get(ctx, key1_obj, key2));
+    }
+
+    // replace old script, if different from new one
+    if (!PyObject_RichCompareBool(value, script, Py_EQ)) {
+        pdf_obj *newaction = JM_new_javascript(ctx, pdf, value);
+        if (!key2) {
+            pdf_dict_put_drop(ctx, annot_obj, key1, newaction);
+        } else {
+            pdf_dict_putl_drop(ctx, annot_obj, newaction, key1, key2, NULL);
+        }
+    }
+    Py_XDECREF(script);
+    return;
+}
+
+/*
+// Execute a JavaScript action for annot or field.
+//-----------------------------------------------------------------------------
+PyObject *
+JM_exec_script(fz_context *ctx, pdf_obj *annot_obj, pdf_obj *key1, pdf_obj *key2)
+{
+    PyObject *script = NULL;
+    char *code = NULL;
+    fz_try(ctx) {
+        pdf_document *pdf = pdf_get_bound_document(ctx, annot_obj);
+        char buf[100];
+        if (!key2) {
+            script = JM_get_script(ctx, key1_obj);
+        } else {
+            script = JM_get_script(ctx, pdf_dict_get(ctx, key1_obj, key2));
+        }
+        code = JM_Python_str_AsChar(script);
+        fz_snprintf(buf, sizeof buf, "%d/A", pdf_to_num(ctx, annot_obj));
+        pdf_js_execute(pdf->js, buf, code);
+    }
+    fz_always(ctx) {
+        JM_Python_str_DelForPy3(code);
+        Py_XDECREF(string);
+    }
+    fz_catch(ctx) {
+        Py_RETURN_FALSE;
+    }
+    Py_RETURN_TRUE;
+}
+*/
+
+// String from widget type
+//-----------------------------------------------------------------------------
+char *JM_field_type_text(int wtype)
+{
+    switch(wtype) {
+        case(PDF_WIDGET_TYPE_BUTTON):
+            return "Button";
+        case(PDF_WIDGET_TYPE_CHECKBOX):
+            return "CheckBox";
+        case(PDF_WIDGET_TYPE_RADIOBUTTON):
+            return "RadioButton";
+        case(PDF_WIDGET_TYPE_TEXT):
+            return "Text";
+        case(PDF_WIDGET_TYPE_LISTBOX):
+            return "ListBox";
+        case(PDF_WIDGET_TYPE_COMBOBOX):
+            return "ComboBox";
+        case(PDF_WIDGET_TYPE_SIGNATURE):
+            return "Signature";
+        default:
+            return "unknown";
+    }
+}
+
+// Set the field type
+//-----------------------------------------------------------------------------
+void JM_set_field_type(fz_context *ctx, pdf_document *doc, pdf_obj *obj, int type)
+{
+       int setbits = 0;
+       int clearbits = 0;
+       pdf_obj *typename = NULL;
+
+       switch(type) {
+       case PDF_WIDGET_TYPE_BUTTON:
+               typename = PDF_NAME(Btn);
+               setbits = PDF_BTN_FIELD_IS_PUSHBUTTON;
+               break;
+       case PDF_WIDGET_TYPE_CHECKBOX:
+               typename = PDF_NAME(Btn);
+               clearbits = PDF_BTN_FIELD_IS_PUSHBUTTON;
+               setbits = PDF_BTN_FIELD_IS_RADIO;
+               break;
+       case PDF_WIDGET_TYPE_RADIOBUTTON:
+               typename = PDF_NAME(Btn);
+               clearbits = (PDF_BTN_FIELD_IS_PUSHBUTTON|PDF_BTN_FIELD_IS_RADIO);
+               break;
+       case PDF_WIDGET_TYPE_TEXT:
+               typename = PDF_NAME(Tx);
+               break;
+       case PDF_WIDGET_TYPE_LISTBOX:
+               typename = PDF_NAME(Ch);
+               clearbits = PDF_CH_FIELD_IS_COMBO;
+               break;
+       case PDF_WIDGET_TYPE_COMBOBOX:
+               typename = PDF_NAME(Ch);
+               setbits = PDF_CH_FIELD_IS_COMBO;
+               break;
+       case PDF_WIDGET_TYPE_SIGNATURE:
+               typename = PDF_NAME(Sig);
+               break;
+       }
+
+       if (typename)
+               pdf_dict_put_drop(ctx, obj, PDF_NAME(FT), typename);
+
+       if (setbits != 0 || clearbits != 0) {
+               int bits = pdf_dict_get_int(ctx, obj, PDF_NAME(Ff));
+               bits &= ~clearbits;
+               bits |= setbits;
+               pdf_dict_put_int(ctx, obj, PDF_NAME(Ff), bits);
+       }
+}
+
+// Copied from MuPDF v1.14
+// Create widget
+//-----------------------------------------------------------------------------
+pdf_annot *JM_create_widget(fz_context *ctx, pdf_document *doc, pdf_page *page, int type, char *fieldname)
+{
+       pdf_obj *form = NULL;
+       int old_sigflags = pdf_to_int(ctx, pdf_dict_getp(ctx, pdf_trailer(ctx, doc), "Root/AcroForm/SigFlags"));
+       pdf_annot *annot = pdf_create_annot_raw(ctx, page, PDF_ANNOT_WIDGET);
+
+       fz_try(ctx) {
+               JM_set_field_type(ctx, doc, annot->obj, type);
+               pdf_dict_put_text_string(ctx, annot->obj, PDF_NAME(T), fieldname);
+
+               if (type == PDF_WIDGET_TYPE_SIGNATURE) {
+                       int sigflags = (old_sigflags | (SigFlag_SignaturesExist|SigFlag_AppendOnly));
+                       pdf_dict_putl_drop(ctx, pdf_trailer(ctx, doc), pdf_new_int(ctx, sigflags), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(SigFlags), NULL);
+               }
+
+               /*
+               pdf_create_annot will have linked the new widget into the page's
+               annot array. We also need it linked into the document's form
+               */
+               form = pdf_dict_getp(ctx, pdf_trailer(ctx, doc), "Root/AcroForm/Fields");
+               if (!form) {
+                       form = pdf_new_array(ctx, doc, 1);
+                       pdf_dict_putl_drop(ctx, pdf_trailer(ctx, doc),
+                               form,
+                               PDF_NAME(Root),
+                               PDF_NAME(AcroForm),
+                               PDF_NAME(Fields),
+                               NULL);
+               }
+
+               pdf_array_push(ctx, form, annot->obj); // Cleanup relies on this statement being last
+       }
+       fz_catch(ctx) {
+               pdf_delete_annot(ctx, page, annot);
+
+               if (type == PDF_WIDGET_TYPE_SIGNATURE) {
+                       pdf_dict_putl_drop(ctx, pdf_trailer(ctx, doc), pdf_new_int(ctx, old_sigflags), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(SigFlags), NULL);
+        }
+
+               fz_rethrow(ctx);
+       }
+
+       return annot;
+}
+
+
+// PushButton get state
+//-----------------------------------------------------------------------------
+PyObject *JM_pushbtn_state(fz_context *ctx, pdf_annot *annot)
+{   // pushed buttons do not reflect status changes in the PDF
+    // always reflect them as untouched
+    Py_RETURN_FALSE;
+}
+
+// CheckBox get state
+//-----------------------------------------------------------------------------
+PyObject *JM_checkbox_state(fz_context *ctx, pdf_annot *annot)
+{
+    pdf_obj *leafv = pdf_dict_get_inheritable(ctx, annot->obj, PDF_NAME(V));
+    pdf_obj *leafas = pdf_dict_get_inheritable(ctx, annot->obj, PDF_NAME(AS));
+    if (!leafv) Py_RETURN_FALSE;
+    if (leafv == PDF_NAME(Off)) Py_RETURN_FALSE;
+    if (leafv == pdf_new_name(ctx, "Yes"))
+        Py_RETURN_TRUE;
+    if (pdf_is_string(ctx, leafv) && !strcmp(pdf_to_text_string(ctx, leafv), "Off"))
+        Py_RETURN_FALSE;
+    if (pdf_is_string(ctx, leafv) && !strcmp(pdf_to_text_string(ctx, leafv), "Yes"))
+        Py_RETURN_TRUE;
+    if (leafas && leafas == PDF_NAME(Off)) Py_RETURN_FALSE;
+    Py_RETURN_TRUE;
+}
+
+// RadioBox get state
+//-----------------------------------------------------------------------------
+PyObject *JM_radiobtn_state(fz_context *ctx, pdf_annot *annot)
+{   // MuPDF treats radio buttons like check boxes - hence so do we
+    return JM_checkbox_state(ctx, annot);
+}
+
+// Text field retrieve value
+//-----------------------------------------------------------------------------
+PyObject *JM_text_value(fz_context *ctx, pdf_annot *annot)
+{
+    const char *text = NULL;
+    fz_var(text);
+    fz_try(ctx)
+        text = pdf_field_value(ctx, annot->obj);
+    fz_catch(ctx) Py_RETURN_NONE;
+    return JM_UnicodeFromStr(text);
+}
+
+// ListBox retrieve value
+//-----------------------------------------------------------------------------
+PyObject *JM_listbox_value(fz_context *ctx, pdf_annot *annot)
+{
+    int i = 0, n = 0;
+    // may be single value or array
+    pdf_obj *optarr = pdf_dict_get(ctx, annot->obj, PDF_NAME(V));
+    if (pdf_is_string(ctx, optarr))         // a single string
+        return PyString_FromString(pdf_to_text_string(ctx, optarr));
+
+    // value is an array (may have len 0)
+    n = pdf_array_len(ctx, optarr);
+    PyObject *liste = PyList_New(0);
+
+    // extract a list of strings
+    // each entry may again be an array: take second entry then
+    for (i = 0; i < n; i++) {
+        pdf_obj *elem = pdf_array_get(ctx, optarr, i);
+        if (pdf_is_array(ctx, elem))
+            elem = pdf_array_get(ctx, elem, 1);
+        LIST_APPEND_DROP(liste, JM_UnicodeFromStr(pdf_to_text_string(ctx, elem)));
+    }
+    return liste;
+}
+
+// ComboBox retrieve value
+//-----------------------------------------------------------------------------
+PyObject *JM_combobox_value(fz_context *ctx, pdf_annot *annot)
+{   // combobox treated like listbox
+    return JM_listbox_value(ctx, annot);
+}
+
+// Signature field retrieve value
+PyObject *JM_signature_value(fz_context *ctx, pdf_annot *annot)
+{   // signatures are currently not supported
+    Py_RETURN_NONE;
+}
+
+// retrieve ListBox / ComboBox choice values
+//-----------------------------------------------------------------------------
+PyObject *JM_choice_options(fz_context *ctx, pdf_annot *annot)
+{   // return list of choices for list or combo boxes
+    pdf_document *pdf = pdf_get_bound_document(ctx, annot->obj);
+    PyObject *val;
+    int n = pdf_choice_widget_options(ctx, (pdf_widget *) annot, 0, NULL);
+    if (n == 0) Py_RETURN_NONE;                     // wrong widget type
+
+    pdf_obj *optarr = pdf_dict_get(ctx, annot->obj, PDF_NAME(Opt));
+    int i, m;
+    PyObject *liste = PyList_New(0);
+
+    for (i = 0; i < n; i++) {
+        m = pdf_array_len(ctx, pdf_array_get(ctx, optarr, i));
+        if (m == 2) {
+            val = Py_BuildValue("ss",
+            pdf_to_text_string(ctx, pdf_array_get(ctx, pdf_array_get(ctx, optarr, i), 0)),
+            pdf_to_text_string(ctx, pdf_array_get(ctx, pdf_array_get(ctx, optarr, i), 1)));
+            LIST_APPEND_DROP(liste, val);
+        } else {
+            val = JM_UnicodeFromStr(pdf_to_text_string(ctx, pdf_array_get(ctx, optarr, i)));
+            LIST_APPEND_DROP(liste, val);
+        }
+    }
+    return liste;
+}
+
+// set ListBox / ComboBox values
+//-----------------------------------------------------------------------------
+void JM_set_choice_options(fz_context *ctx, pdf_annot *annot, PyObject *liste)
+{
+    if (!liste) return;
+    if (!PySequence_Check(liste)) return;
+    Py_ssize_t i, n = PySequence_Size(liste);
+    if (n < 1) return;
+    pdf_document *pdf = pdf_get_bound_document(ctx, annot->obj);
+    char *opt = NULL;
+    pdf_obj *optarr = pdf_new_array(ctx, pdf, n);
+    PyObject *val = NULL;
+    for (i = 0; i < n; i++) {
+        val = PySequence_ITEM(liste, i);
+        opt = JM_Python_str_AsChar(val);
+        pdf_array_push_text_string(ctx, optarr, (const char *) opt);
+        JM_Python_str_DelForPy3(opt);
+        Py_CLEAR(val);
+    }
+
+    pdf_dict_put(ctx, annot->obj, PDF_NAME(Opt), optarr);
+    return;
+}
+
+//-----------------------------------------------------------------------------
+// Populate a Python Widget object with the values from a PDF form field.
+// Called by "Page.firstWidget" and "Widget.next".
+//-----------------------------------------------------------------------------
+void JM_get_widget_properties(fz_context *ctx, pdf_annot *annot, PyObject *Widget)
+{
+    pdf_document *pdf = annot->page->doc;
+    pdf_widget *tw = (pdf_widget *) annot;
+    pdf_obj *obj = NULL, *js = NULL, *o = NULL;
+    fz_buffer *res = NULL;
+    Py_ssize_t i = 0, n = 0;
+    fz_try(ctx) {
+        int field_type = pdf_widget_type(gctx, tw);
+        SETATTR_DROP(Widget, "field_type", Py_BuildValue("i", field_type));
+        if (field_type == PDF_WIDGET_TYPE_SIGNATURE) {
+            if (pdf_signature_is_signed(ctx, pdf, annot->obj)) {
+                SETATTR("is_signed", Py_True);
+            } else {
+                SETATTR("is_signed", Py_False);
+            }
+        } else {
+            SETATTR("is_signed", Py_None);
+        }
+        SETATTR_DROP(Widget, "border_style",
+                JM_UnicodeFromStr(pdf_field_border_style(ctx, annot->obj)));
+        SETATTR_DROP(Widget, "field_type_string",
+                JM_UnicodeFromStr(JM_field_type_text(field_type)));
+
+        char *field_name = pdf_field_name(ctx, annot->obj);
+        SETATTR_DROP(Widget, "field_name", JM_UnicodeFromStr(field_name));
+        JM_Free(field_name);
+
+        const char *label = NULL;
+        obj = pdf_dict_get(ctx, annot->obj, PDF_NAME(TU));
+        if (obj) label = pdf_to_text_string(ctx, obj);
+        SETATTR_DROP(Widget, "field_label", JM_UnicodeFromStr(label));
+
+        SETATTR_DROP(Widget, "field_value",
+                JM_UnicodeFromStr(pdf_field_value(ctx, annot->obj)));
+
+        SETATTR_DROP(Widget, "field_display",
+                Py_BuildValue("i", pdf_field_display(ctx, annot->obj)));
+
+        float border_width = pdf_to_real(ctx, pdf_dict_getl(ctx, annot->obj,
+                                PDF_NAME(BS), PDF_NAME(W), NULL));
+        if (border_width == 0) border_width = 1;
+        SETATTR_DROP(Widget, "border_width",
+                Py_BuildValue("f", border_width));
+
+        obj = pdf_dict_getl(ctx, annot->obj,
+                                PDF_NAME(BS), PDF_NAME(D), NULL);
+        if (pdf_is_array(ctx, obj)) {
+            n = (Py_ssize_t) pdf_array_len(ctx, obj);
+            PyObject *d = PyList_New(n);
+            for (i = 0; i < n; i++) {
+                PyList_SET_ITEM(d, i, Py_BuildValue("i", pdf_to_int(ctx,
+                                pdf_array_get(ctx, obj, (int) i))));
+            }
+            SETATTR_DROP(Widget, "border_dashes", d);
+        }
+
+        SETATTR_DROP(Widget, "text_maxlen",
+                Py_BuildValue("i", pdf_text_widget_max_len(ctx, tw)));
+
+        SETATTR_DROP(Widget, "text_format",
+                Py_BuildValue("i", pdf_text_widget_format(ctx, tw)));
+
+        obj = pdf_dict_getl(ctx, annot->obj, PDF_NAME(MK), PDF_NAME(BG), NULL);
+        if (pdf_is_array(ctx, obj)) {
+            n = (Py_ssize_t) pdf_array_len(ctx, obj);
+            PyObject *col = PyList_New(n);
+            for (i = 0; i < n; i++) {
+                PyList_SET_ITEM(col, i, Py_BuildValue("f",
+                pdf_to_real(ctx, pdf_array_get(ctx, obj, (int) i))));
+            }
+            SETATTR_DROP(Widget, "fill_color", col);
+        }
+
+        obj = pdf_dict_getl(ctx, annot->obj, PDF_NAME(MK), PDF_NAME(BC), NULL);
+        if (pdf_is_array(ctx, obj)) {
+            n = (Py_ssize_t) pdf_array_len(ctx, obj);
+            PyObject *col = PyList_New(n);
+            for (i = 0; i < n; i++) {
+                PyList_SET_ITEM(col, i, Py_BuildValue("f",
+                pdf_to_real(ctx, pdf_array_get(ctx, obj, (int) i))));
+            }
+            SETATTR_DROP(Widget, "border_color", col);
+        }
+
+        SETATTR_DROP(Widget, "choice_values", JM_choice_options(ctx, annot));
+
+        const char *da = pdf_to_text_string(ctx, pdf_dict_get_inheritable(ctx,
+                                        annot->obj, PDF_NAME(DA)));
+        SETATTR_DROP(Widget, "_text_da", JM_UnicodeFromStr(da));
+
+        obj = pdf_dict_getl(ctx, annot->obj, PDF_NAME(MK), PDF_NAME(CA), NULL);
+        if (obj) {
+            SETATTR_DROP(Widget, "button_caption",
+                    JM_UnicodeFromStr((char *)pdf_to_text_string(ctx, obj)));
+        }
+
+        SETATTR_DROP(Widget, "field_flags",
+                Py_BuildValue("i", pdf_field_flags(ctx, annot->obj)));
+
+        // call Py method to reconstruct text color, font name, size
+        PyObject *call = CALLATTR("_parse_da", NULL);
+        Py_XDECREF(call);
+
+        // extract JavaScript action texts
+        SETATTR_DROP(Widget, "script",
+            JM_get_script(ctx, pdf_dict_get(ctx, annot->obj, PDF_NAME(A))));
+
+        SETATTR_DROP(Widget, "script_stroke",
+            JM_get_script(ctx, pdf_dict_getl(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(K), NULL)));
+
+        SETATTR_DROP(Widget, "script_format",
+            JM_get_script(ctx, pdf_dict_getl(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(F), NULL)));
+
+        SETATTR_DROP(Widget, "script_change",
+            JM_get_script(ctx, pdf_dict_getl(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(V), NULL)));
+
+        SETATTR_DROP(Widget, "script_calc",
+            JM_get_script(ctx, pdf_dict_getl(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(C), NULL)));
+    }
+    fz_always(ctx) PyErr_Clear();
+    fz_catch(ctx) fz_rethrow(ctx);
+    return;
+}
+
+
+//-----------------------------------------------------------------------------
+// Update the PDF form field with the properties from a Python Widget object.
+// Called by "Page.addWidget" and "Annot.updateWidget".
+//-----------------------------------------------------------------------------
+void JM_set_widget_properties(fz_context *ctx, pdf_annot *annot, PyObject *Widget)
+{
+    pdf_document *pdf = annot->page->doc;
+    pdf_page *page = annot->page;
+    fz_rect rect;
+    pdf_obj *fill_col = NULL, *border_col = NULL;
+    pdf_obj *dashes = NULL;
+    Py_ssize_t i, n = 0;
+    int d;
+    int result = 0;
+    PyObject *value = GETATTR("field_type");
+    int field_type = (int) PyInt_AsLong(value);
+    Py_DECREF(value);
+
+    // rectangle --------------------------------------------------------------
+    value = GETATTR("rect");
+    rect = JM_rect_from_py(value);
+    Py_XDECREF(value);
+    fz_matrix rot_mat = JM_rotate_page_matrix(ctx, page);
+    rect = fz_transform_rect(rect, rot_mat);
+    pdf_set_annot_rect(ctx, annot, rect);
+
+    // fill color -------------------------------------------------------------
+    value = GETATTR("fill_color");
+    if (value && PySequence_Check(value)) {
+        n = PySequence_Size(value);
+        fill_col = pdf_new_array(ctx, pdf, n);
+        float col = 0;
+        for (i = 0; i < n; i++) {
+            JM_FLOAT_ITEM(value, i, &col);
+            pdf_array_push_real(ctx, fill_col, col);
+        }
+        pdf_field_set_fill_color(ctx, annot->obj, fill_col);
+        pdf_drop_obj(ctx, fill_col);
+    }
+    Py_XDECREF(value);
+
+    // dashes -----------------------------------------------------------------
+    value = GETATTR("border_dashes");
+    if (value && PySequence_Check(value)) {
+        n = PySequence_Size(value);
+        dashes = pdf_new_array(ctx, pdf, n);
+        for (i = 0; i < n; i++) {
+            pdf_array_push_int(ctx, dashes,
+                               (int64_t) PyInt_AsLong(PySequence_ITEM(value, i)));
+        }
+        pdf_dict_putl_drop(ctx, annot->obj, dashes,
+                                PDF_NAME(BS),
+                                PDF_NAME(D),
+                                NULL);
+    }
+    Py_XDECREF(value);
+
+    // border color -----------------------------------------------------------
+    value = GETATTR("border_color");
+    if (value && PySequence_Check(value)) {
+        n = PySequence_Size(value);
+        border_col = pdf_new_array(ctx, pdf, n);
+        float col = 0;
+        for (i = 0; i < n; i++) {
+            JM_FLOAT_ITEM(value, i, &col);
+            pdf_array_push_real(ctx, border_col, col);
+        }
+        pdf_dict_putl_drop(ctx, annot->obj, border_col,
+                                PDF_NAME(MK),
+                                PDF_NAME(BC),
+                                NULL);
+    }
+    Py_XDECREF(value);
+
+    // entry ignored - may be used later
+    /*
+    int text_format = (int) PyInt_AsLong(GETATTR("text_format"));
+    */
+
+    // field label -----------------------------------------------------------
+    value = GETATTR("field_label");
+    if (value != Py_None) {
+        char *label = JM_Python_str_AsChar(value);
+        pdf_dict_put_text_string(ctx, annot->obj, PDF_NAME(TU), label);
+        JM_Python_str_DelForPy3(label);
+    }
+    Py_XDECREF(value);
+
+    // field name -------------------------------------------------------------
+    value = GETATTR("field_name");
+    if (value != Py_None) {
+        char *name = JM_Python_str_AsChar(value);
+        char *old_name = pdf_field_name(ctx, annot->obj);
+        if (strcmp(name, old_name) != 0) {
+            pdf_dict_put_text_string(ctx, annot->obj, PDF_NAME(T), name);
+        }
+        JM_Python_str_DelForPy3(name);
+        JM_Free(old_name);
+    }
+    Py_XDECREF(value);
+
+    // max text len -----------------------------------------------------------
+    if (field_type == PDF_WIDGET_TYPE_TEXT)
+    {
+        value = GETATTR("text_maxlen");
+        int text_maxlen = (int) PyInt_AsLong(value);
+        if (text_maxlen) {
+            pdf_dict_put_int(ctx, annot->obj, PDF_NAME(MaxLen), text_maxlen);
+        }
+        Py_XDECREF(value);
+    }
+    value = GETATTR("field_display");
+    d = (int) PyInt_AsLong(value);
+    Py_XDECREF(value);
+    pdf_field_set_display(ctx, annot->obj, d);
+
+    // choice values ----------------------------------------------------------
+    if (field_type == PDF_WIDGET_TYPE_LISTBOX ||
+        field_type == PDF_WIDGET_TYPE_COMBOBOX) {
+        value = GETATTR("choice_values");
+        JM_set_choice_options(ctx, annot, value);
+        Py_XDECREF(value);
+    }
+
+    // border style -----------------------------------------------------------
+    value = GETATTR("border_style");
+    pdf_obj *val = JM_get_border_style(ctx, value);
+    Py_XDECREF(value);
+    pdf_dict_putl_drop(ctx, annot->obj, val,
+                            PDF_NAME(BS),
+                            PDF_NAME(S),
+                            NULL);
+
+    // border width -----------------------------------------------------------
+    value = GETATTR("border_width");
+    float border_width = (float) PyFloat_AsDouble(value);
+    Py_XDECREF(value);
+    pdf_dict_putl_drop(ctx, annot->obj, pdf_new_real(ctx, border_width),
+                            PDF_NAME(BS),
+                            PDF_NAME(W),
+                            NULL);
+
+    // /DA string -------------------------------------------------------------
+    value = GETATTR("_text_da");
+    char *da = JM_Python_str_AsChar(value);
+    Py_XDECREF(value);
+    pdf_dict_put_text_string(ctx, annot->obj, PDF_NAME(DA), da);
+    JM_Python_str_DelForPy3(da);
+    pdf_dict_del(ctx, annot->obj, PDF_NAME(DS)); /* not supported by MuPDF */
+    pdf_dict_del(ctx, annot->obj, PDF_NAME(RC)); /* not supported by MuPDF */
+
+    // field flags ------------------------------------------------------------
+    int field_flags = 0, Ff = 0;
+    if (field_type != PDF_WIDGET_TYPE_CHECKBOX) {
+        value = GETATTR("field_flags");
+        field_flags = (int) PyInt_AsLong(value);
+        if (!PyErr_Occurred()) {
+            Ff = pdf_field_flags(ctx, annot->obj);
+            Ff |= field_flags;
+        }
+        Py_XDECREF(value);
+    }
+    pdf_dict_put_int(ctx, annot->obj, PDF_NAME(Ff), Ff);
+
+    // button caption ---------------------------------------------------------
+    value = GETATTR("button_caption");
+    char *ca = JM_Python_str_AsChar(value);
+    Py_XDECREF(value);
+    if (ca) {
+        pdf_field_set_button_caption(ctx, annot->obj, ca);
+        JM_Python_str_DelForPy3(ca);
+    }
+
+    // script (/A) -------------------------------------------------------
+    value = GETATTR("script");
+    JM_put_script(ctx, annot->obj, PDF_NAME(A), NULL, value);
+    Py_CLEAR(value);
+
+    // script (/AA/K) -------------------------------------------------------
+    value = GETATTR("script_stroke");
+    JM_put_script(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(K), value);
+    Py_CLEAR(value);
+
+    // script (/AA/F) -------------------------------------------------------
+    value = GETATTR("script_format");
+    JM_put_script(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(F), value);
+    Py_CLEAR(value);
+
+    // script (/AA/V) -------------------------------------------------------
+    value = GETATTR("script_change");
+    JM_put_script(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(V), value);
+    Py_CLEAR(value);
+
+    // script (/AA/C) -------------------------------------------------------
+    value = GETATTR("script_calc");
+    JM_put_script(ctx, annot->obj, PDF_NAME(AA), PDF_NAME(C), value);
+    Py_CLEAR(value);
+
+    // field value ------------------------------------------------------------
+    // MuPDF function "pdf_set_field_value" always sets strings. For button
+    // fields this may lead to an unrecognized state for some PDF viewers.
+    //-------------------------------------------------------------------------
+    value = GETATTR("field_value");
+    char *text = NULL;
+    switch(field_type)
+    {
+    case PDF_WIDGET_TYPE_CHECKBOX:
+    case PDF_WIDGET_TYPE_RADIOBUTTON:
+        if (PyObject_RichCompareBool(value, Py_True, Py_EQ)) {
+            result = pdf_set_field_value(ctx, pdf, annot->obj, "Yes", 1);
+            pdf_dict_put_name(ctx, annot->obj, PDF_NAME(V), "Yes");
+        } else {
+            result = pdf_set_field_value(ctx, pdf, annot->obj, "Off", 1);
+            pdf_dict_put(ctx, annot->obj, PDF_NAME(V), PDF_NAME(Off));
+        }
+        break;
+    default:
+        text = JM_Python_str_AsChar(value);
+        if (text) {
+            result = pdf_set_field_value(ctx, pdf, annot->obj, (const char *)text, 1);
+            JM_Python_str_DelForPy3(text);
+        }
+    }
+    Py_CLEAR(value);
+    PyErr_Clear();
+    pdf_dirty_annot(ctx, annot);
+    annot->is_hot = 1;
+    annot->is_active = 1;
+    pdf_update_appearance(ctx, annot);
+}
+#undef SETATTR
+#undef GETATTR
+#undef CALLATTR
+%}
+
+%pythoncode %{
+#------------------------------------------------------------------------------
+# Class describing a PDF form field ("widget")
+#------------------------------------------------------------------------------
+class Widget(object):
+    def __init__(self):
+        self.border_color = None
+        self.border_style = "S"
+        self.border_width = 0
+        self.border_dashes = None
+        self.choice_values = None  # choice fields only
+
+        self.field_name = None  # field name
+        self.field_label = None  # field label
+        self.field_value = None
+        self.field_flags = None
+        self.field_display = 0
+        self.field_type = 0  # valid range 1 through 7
+        self.field_type_string = None  # field type as string
+
+        self.fill_color = None
+        self.button_caption = None  # button caption
+        self.is_signed = None  # True / False if signature
+        self.text_color = (0, 0, 0)
+        self.text_font = "Helv"
+        self.text_fontsize = 0
+        self.text_maxlen = 0  # text fields only
+        self.text_format = 0  # text fields only
+        self._text_da = ""  # /DA = default apparance
+
+        self.script = None  # JavaScript (/A)
+        self.script_stroke = None  # JavaScript (/AA/K)
+        self.script_format = None  # JavaScript (/AA/F)
+        self.script_change = None  # JavaScript (/AA/V)
+        self.script_calc = None  # JavaScript (/AA/C)
+
+        self.rect = None  # annot value
+        self.xref = 0  # annot value
+
+
+    def _validate(self):
+        """Validate the class entries.
+        """
+        if (self.rect.isInfinite
+            or self.rect.isEmpty
+           ):
+            raise ValueError("bad rect")
+
+        if not self.field_name:
+            raise ValueError("field name missing")
+
+        if self.field_label == "Unnamed":
+            self.field_label = None
+        CheckColor(self.border_color)
+        CheckColor(self.fill_color)
+        if not self.text_color:
+            self.text_color = (0, 0, 0)
+        CheckColor(self.text_color)
+
+        if not self.border_width:
+            self.border_width = 0
+
+        if not self.text_fontsize:
+            self.text_fontsize = 0
+
+        self.border_style = self.border_style.upper()[0:1]
+
+        # standardize content of JavaScript entries
+        btn_type = self.field_type in (
+            PDF_WIDGET_TYPE_BUTTON,
+            PDF_WIDGET_TYPE_CHECKBOX,
+            PDF_WIDGET_TYPE_RADIOBUTTON
+        )
+        if not self.script:
+            self.script = None
+        elif type(self.script) not in string_types:
+            raise ValueError("script content must be unicode")
+
+        # buttons cannot have the following script actions
+        if btn_type or not self.script_calc:
+            self.script_calc = None
+        elif type(self.script_calc) not in string_types:
+            raise ValueError("script_calc content must be unicode")
+
+        if btn_type or not self.script_change:
+            self.script_change = None
+        elif type(self.script_change) not in string_types:
+            raise ValueError("script_change content must be unicode")
+
+        if btn_type or not self.script_format:
+            self.script_format = None
+        elif type(self.script_format) not in string_types:
+            raise ValueError("script_format content must be unicode")
+
+        if btn_type or not self.script_stroke:
+            self.script_stroke = None
+        elif type(self.script_stroke) not in string_types:
+            raise ValueError("script_stroke content must be unicode")
+
+        self._checker()  # any field_type specific checks
+
+
+    def _adjust_font(self):
+        """Ensure text_font is from our list and correctly spelled.
+        """
+        if not self.text_font:
+            self.text_font = "Helv"
+            return
+        valid_fonts = ("Cour", "TiRo", "Helv", "ZaDb")
+        for f in valid_fonts:
+            if self.text_font.lower() == f.lower():
+                self.text_font = f
+                return
+        self.text_font = "Helv"
+        return
+
+
+    def _parse_da(self):
+        """Extract font name, size and color from default appearance string (/DA object).
+
+        Equivalent to 'pdf_parse_default_appearance' function in MuPDF's 'pdf-annot.c'.
+        """
+        if not self._text_da:
+            return
+        font = "Helv"
+        fsize = 0
+        col = (0, 0, 0)
+        dat = self._text_da.split()  # split on any whitespace
+        for i, item in enumerate(dat):
+            if item == "Tf":
+                font = dat[i - 2][1:]
+                fsize = float(dat[i - 1])
+                dat[i] = dat[i-1] = dat[i-2] = ""
+                continue
+            if item == "g":  # unicolor text
+                col = [(float(dat[i - 1]))]
+                dat[i] = dat[i-1] = ""
+                continue
+            if item == "rg":  # RGB colored text
+                col = [float(f) for f in dat[i - 3:i]]
+                dat[i] = dat[i-1] = dat[i-2] = dat[i-3] = ""
+                continue
+        self.text_font = font
+        self.text_fontsize = fsize
+        self.text_color = col
+        self._text_da = ""
+        return
+
+
+    def _checker(self):
+        """Any widget type checks.
+        """
+        if self.field_type not in range(1, 8):
+            raise ValueError("bad field type")
+
+
+    def update(self):
+        """Reflect Python object in the PDF.
+        """
+        doc = self.parent.parent
+        self._validate()
+
+        self._adjust_font()  # ensure valid text_font name
+
+        # now create the /DA string
+        self._text_da = ""
+        if   len(self.text_color) == 3:
+            fmt = "{:g} {:g} {:g} rg /{f:s} {s:g} Tf" + self._text_da
+        elif len(self.text_color) == 1:
+            fmt = "{:g} g /{f:s} {s:g} Tf" + self._text_da
+        elif len(self.text_color) == 4:
+            fmt = "{:g} {:g} {:g} {:g} k /{f:s} {s:g} Tf" + self._text_da
+        self._text_da = fmt.format(*self.text_color, f=self.text_font,
+                                    s=self.text_fontsize)
+        # finally update the widget
+
+        TOOLS._save_widget(self._annot, self)
+        self._text_da = ""
+
+    def reset(self):
+        """Reset the field value to its default.
+        """
+        TOOLS._reset_widget(self._annot)
+
+    def __repr__(self):
+        return "'%s' widget on %s" % (self.field_type_string, str(self.parent))
+
+    def __del__(self):
+        self._annot.__del__()
+
+    @property
+    def next(self):
+        return self._annot.next
+%}
diff --git a/fitz/helper-geo-c.i b/fitz/helper-geo-c.i

new file mode 100644 (file)

index 0000000..97f22f4
--- /dev/null
+++ b/fitz/helper-geo-c.i
@@ -0,0 +1,196 @@
+%{
+
+//-----------------------------------------------------------------------------
+// Functions converting betwenn PySequences and fitz geometry objects
+//-----------------------------------------------------------------------------
+static int
+JM_INT_ITEM(PyObject *obj, Py_ssize_t idx, int *result)
+{
+    PyObject *temp = PySequence_ITEM(obj, idx);
+    if (!temp) return 1;
+    *result = (int) PyLong_AsLong(temp);
+    Py_DECREF(temp);
+    if (PyErr_Occurred()) {
+        PyErr_Clear();
+        return 1;
+    }
+    return 0;
+}
+
+static int
+JM_FLOAT_ITEM(PyObject *obj, Py_ssize_t idx, float *result)
+{
+    PyObject *temp = PySequence_ITEM(obj, idx);
+    if (!temp) return 1;
+    *result = (float) PyFloat_AsDouble(temp);
+    Py_DECREF(temp);
+    if (PyErr_Occurred()) {
+        PyErr_Clear();
+        return 1;
+    }
+    return 0;
+}
+
+//-----------------------------------------------------------------------------
+// PySequence to fz_rect. Default: infinite rect
+//-----------------------------------------------------------------------------
+static fz_rect
+JM_rect_from_py(PyObject *r)
+{
+    if (!r || !PySequence_Check(r) || PySequence_Size(r) != 4)
+        return fz_infinite_rect;
+    Py_ssize_t i;
+    float f[4];
+
+    for (i = 0; i < 4; i++)
+        if (JM_FLOAT_ITEM(r, i, &f[i]) == 1) return fz_infinite_rect;
+
+    return fz_make_rect(f[0], f[1], f[2], f[3]);
+}
+
+//-----------------------------------------------------------------------------
+// PySequence from fz_rect
+//-----------------------------------------------------------------------------
+static PyObject *
+JM_py_from_rect(fz_rect r)
+{
+    return Py_BuildValue("ffff", r.x0, r.y0, r.x1, r.y1);
+}
+
+//-----------------------------------------------------------------------------
+// PySequence to fz_irect. Default: infinite irect
+//-----------------------------------------------------------------------------
+static fz_irect
+JM_irect_from_py(PyObject *r)
+{
+    if (!PySequence_Check(r) || PySequence_Size(r) != 4)
+        return fz_infinite_irect;
+    int x[4];
+    Py_ssize_t i;
+
+    for (i = 0; i < 4; i++)
+        if (JM_INT_ITEM(r, i, &x[i]) == 1) return fz_infinite_irect;
+
+    return fz_make_irect(x[0], x[1], x[2], x[3]);
+}
+
+//-----------------------------------------------------------------------------
+// PySequence from fz_irect
+//-----------------------------------------------------------------------------
+static PyObject *
+JM_py_from_irect(fz_irect r)
+{
+    return Py_BuildValue("iiii", r.x0, r.y0, r.x1, r.y1);
+}
+
+
+//-----------------------------------------------------------------------------
+// PySequence to fz_point. Default: (0, 0)
+//-----------------------------------------------------------------------------
+static fz_point
+JM_point_from_py(PyObject *p)
+{
+    fz_point p0 = fz_make_point(0, 0);
+    float x, y;
+
+    if (!p || !PySequence_Check(p) || PySequence_Size(p) != 2)
+        return p0;
+
+    if (JM_FLOAT_ITEM(p, 0, &x) == 1) return p0;
+    if (JM_FLOAT_ITEM(p, 1, &y) == 1) return p0;
+
+    return fz_make_point(x, y);
+}
+
+//-----------------------------------------------------------------------------
+// PySequence from fz_point
+//-----------------------------------------------------------------------------
+static PyObject *
+JM_py_from_point(fz_point p)
+{
+    return Py_BuildValue("ff", p.x, p.y);
+}
+
+
+//-----------------------------------------------------------------------------
+// PySequence to fz_matrix. Default: fz_identity
+//-----------------------------------------------------------------------------
+static fz_matrix
+JM_matrix_from_py(PyObject *m)
+{
+    Py_ssize_t i;
+    float a[6];
+
+    if (!m || !PySequence_Check(m) || PySequence_Size(m) != 6)
+        return fz_identity;
+
+    for (i = 0; i < 6; i++)
+        if (JM_FLOAT_ITEM(m, i, &a[i]) == 1) return fz_identity;
+
+    return fz_make_matrix(a[0], a[1], a[2], a[3], a[4], a[5]);
+}
+
+//-----------------------------------------------------------------------------
+// PySequence from fz_matrix
+//-----------------------------------------------------------------------------
+static PyObject *
+JM_py_from_matrix(fz_matrix m)
+{
+    return Py_BuildValue("ffffff", m.a, m.b, m.c, m.d, m.e, m.f);
+}
+
+//-----------------------------------------------------------------------------
+// fz_quad from PySequence. Four floats are treated as rect.
+// Else must be four pairs of floats.
+//-----------------------------------------------------------------------------
+static fz_quad
+JM_quad_from_py(PyObject *r)
+{
+    fz_quad q = fz_make_quad(0, 0, 0, 0, 0, 0, 0, 0);
+    fz_point p[4];
+    float test;
+    Py_ssize_t i;
+    PyObject *obj = NULL;
+
+    if (!r || !PySequence_Check(r) || PySequence_Size(r) != 4)
+        return q;
+
+    if (JM_FLOAT_ITEM(r, 0, &test) == 0)
+        return fz_quad_from_rect(JM_rect_from_py(r));
+
+    for (i = 0; i < 4; i++) {
+        obj = PySequence_ITEM(r, i);  // next point item
+        if (!obj || !PySequence_Check(obj) || PySequence_Size(obj) != 2)
+            goto exit_result;  // invalid: cancel the rest
+
+        if (JM_FLOAT_ITEM(obj, 0, &p[i].x) == 1) goto exit_result;
+        if (JM_FLOAT_ITEM(obj, 1, &p[i].y) == 1) goto exit_result;
+
+        Py_CLEAR(obj);
+    }
+    q.ul = p[0];
+    q.ur = p[1];
+    q.ll = p[2];
+    q.lr = p[3];
+    return q;
+
+    exit_result:;
+    Py_CLEAR(obj);
+    return q;
+}
+
+//-----------------------------------------------------------------------------
+// PySequence from fz_quad.
+//-----------------------------------------------------------------------------
+static PyObject *
+JM_py_from_quad(fz_quad quad)
+{
+    PyObject *pquad = PyTuple_New(4);
+    PyTuple_SET_ITEM(pquad, 0, JM_py_from_point(quad.ul));
+    PyTuple_SET_ITEM(pquad, 1, JM_py_from_point(quad.ur));
+    PyTuple_SET_ITEM(pquad, 2, JM_py_from_point(quad.ll));
+    PyTuple_SET_ITEM(pquad, 3, JM_py_from_point(quad.lr));
+    return pquad;
+}
+
+%}
diff --git a/fitz/helper-geo-py.i b/fitz/helper-geo-py.i

new file mode 100644 (file)

index 0000000..6f8fa06
--- /dev/null
+++ b/fitz/helper-geo-py.i
@@ -0,0 +1,983 @@
+%pythoncode %{
+class Matrix(object):
+    """Matrix() - all zeros
+    Matrix(a, b, c, d, e, f)
+    Matrix(zoom-x, zoom-y) - zoom
+    Matrix(shear-x, shear-y, 1) - shear
+    Matrix(degree) - rotate
+    Matrix(Matrix) - new copy
+    Matrix(sequence) - from 'sequence'"""
+    def __init__(self, *args):
+        if not args:
+            self.a = self.b = self.c = self.d = self.e = self.f = 0.0
+            return None
+        if len(args) > 6:
+            raise ValueError("bad sequ. length")
+        if len(args) == 6:  # 6 numbers
+            self.a, self.b, self.c, self.d, self.e, self.f = map(float, args)
+            return None
+        if len(args) == 1:  # either an angle or a sequ
+            if hasattr(args[0], "__float__"):
+                theta = math.radians(args[0])
+                c = round(math.cos(theta), 8)
+                s = round(math.sin(theta), 8)
+                self.a = self.d = c
+                self.b = s
+                self.c = -s
+                self.e = self.f = 0.0
+                return None
+            else:
+                self.a, self.b, self.c, self.d, self.e, self.f = map(float, args[0])
+                return None
+        if len(args) == 2 or len(args) == 3 and args[2] == 0:
+            self.a, self.b, self.c, self.d, self.e, self.f = float(args[0]), \
+                0.0, 0.0, float(args[1]), 0.0, 0.0
+            return None
+        if len(args) == 3 and args[2] == 1:
+            self.a, self.b, self.c, self.d, self.e, self.f = 1.0, \
+                float(args[1]), float(args[0]), 1.0, 0.0, 0.0
+            return None
+        raise ValueError("illegal Matrix constructor")
+
+    def invert(self, src=None):
+        """Calculate the inverted matrix. Return 0 if successful and replace
+        current one. Else return 1 and do nothing.
+        """
+        if src is None:
+            dst = TOOLS._invert_matrix(self)
+        else:
+            dst = TOOLS._invert_matrix(src)
+        if dst[0] == 1:
+            return 1
+        self.a, self.b, self.c, self.d, self.e, self.f = dst[1]
+        return 0
+
+    def preTranslate(self, tx, ty):
+        """Calculate pre translation and replace current matrix."""
+        tx = float(tx)
+        ty = float(ty)
+        self.e += tx * self.a + ty * self.c
+        self.f += tx * self.b + ty * self.d
+        return self
+
+    def preScale(self, sx, sy):
+        """Calculate pre scaling and replace current matrix."""
+        sx = float(sx)
+        sy = float(sy)
+        self.a *= sx
+        self.b *= sx
+        self.c *= sy
+        self.d *= sy
+        return self
+
+    def preShear(self, h, v):
+        """Calculate pre shearing and replace current matrix."""
+        h = float(h)
+        v = float(v)
+        a, b = self.a, self.b
+        self.a += v * self.c
+        self.b += v * self.d
+        self.c += h * a
+        self.d += h * b
+        return self
+
+    def preRotate(self, theta):
+        """Calculate pre rotation and replace current matrix."""
+        theta = float(theta)
+        while theta < 0: theta += 360
+        while theta >= 360: theta -= 360
+        if abs(0 - theta) < EPSILON:
+            pass
+
+        elif abs(90.0 - theta) < EPSILON:
+            a = self.a
+            b = self.b
+            self.a = self.c
+            self.b = self.d
+            self.c = -a
+            self.d = -b
+
+        elif abs(180.0 - theta) < EPSILON:
+            self.a = -self.a
+            self.b = -self.b
+            self.c = -self.c
+            self.d = -self.d
+
+        elif abs(270.0 - theta) < EPSILON:
+            a = self.a
+            b = self.b
+            self.a = -self.c
+            self.b = -self.d
+            self.c = a
+            self.d = b
+
+        else:
+            rad = math.radians(theta)
+            s = math.sin(rad)
+            c = math.cos(rad)
+            a = self.a
+            b = self.b
+            self.a = c * a + s * self.c
+            self.b = c * b + s * self.d
+            self.c =-s * a + c * self.c
+            self.d =-s * b + c * self.d
+
+        return self
+
+    def concat(self, one, two):
+        """Multiply two matrices and replace current one."""
+        if not len(one) == len(two) == 6:
+            raise ValueError("bad sequ. length")
+        self.a, self.b, self.c, self.d, self.e, self.f = TOOLS._concat_matrix(one, two)
+        return self
+
+    def __getitem__(self, i):
+        return (self.a, self.b, self.c, self.d, self.e, self.f)[i]
+
+    def __setitem__(self, i, v):
+        v = float(v)
+        if   i == 0: self.a = v
+        elif i == 1: self.b = v
+        elif i == 2: self.c = v
+        elif i == 3: self.d = v
+        elif i == 4: self.e = v
+        elif i == 5: self.f = v
+        else:
+            raise IndexError("index out of range")
+        return
+
+    def __len__(self):
+        return 6
+
+    def __repr__(self):
+        return "Matrix" + str(tuple(self))
+
+    def __invert__(self):
+        """Calculate inverted matrix."""
+        m1 = Matrix()
+        m1.invert(self)
+        return m1
+    __inv__ = __invert__
+
+    def __mul__(self, m):
+        if hasattr(m, "__float__"):
+            return Matrix(self.a * m, self.b * m, self.c * m,
+                          self.d * m, self.e * m, self.f * m)
+        m1 = Matrix(1,1)
+        return m1.concat(self, m)
+
+    def __truediv__(self, m):
+        if hasattr(m, "__float__"):
+            return Matrix(self.a * 1./m, self.b * 1./m, self.c * 1./m,
+                          self.d * 1./m, self.e * 1./m, self.f * 1./m)
+        m1 = TOOLS._invert_matrix(m)[1]
+        if not m1:
+            raise ZeroDivisionError("matrix not invertible")
+        m2 = Matrix(1,1)
+        return m2.concat(self, m1)
+    __div__ = __truediv__
+
+    def __add__(self, m):
+        if hasattr(m, "__float__"):
+            return Matrix(self.a + m, self.b + m, self.c + m,
+                          self.d + m, self.e + m, self.f + m)
+        if len(m) != 6:
+            raise ValueError("bad sequ. length")
+        return Matrix(self.a + m[0], self.b + m[1], self.c + m[2],
+                          self.d + m[3], self.e + m[4], self.f + m[5])
+
+    def __sub__(self, m):
+        if hasattr(m, "__float__"):
+            return Matrix(self.a - m, self.b - m, self.c - m,
+                          self.d - m, self.e - m, self.f - m)
+        if len(m) != 6:
+            raise ValueError("bad sequ. length")
+        return Matrix(self.a - m[0], self.b - m[1], self.c - m[2],
+                          self.d - m[3], self.e - m[4], self.f - m[5])
+
+    def __pos__(self):
+        return Matrix(self)
+
+    def __neg__(self):
+        return Matrix(-self.a, -self.b, -self.c, -self.d, -self.e, -self.f)
+
+    def __bool__(self):
+        return not (max(self) == min(self) == 0)
+
+    def __nonzero__(self):
+        return not (max(self) == min(self) == 0)
+
+    def __eq__(self, mat):
+        if not hasattr(mat, "__len__"):
+            return False
+        return len(mat) == 6 and bool(self - mat) is False
+
+    def __abs__(self):
+        return math.sqrt(sum([c*c for c in self]))
+
+    norm = __abs__
+
+    @property
+    def isRectilinear(self):
+        """True if rectangles are mapped to rectangles."""
+        return (abs(self.b) < EPSILON and abs(self.c) < EPSILON) or \
+            (abs(self.a) < EPSILON and abs(self.d) < EPSILON);
+
+
+class IdentityMatrix(Matrix):
+    """Identity matrix [1, 0, 0, 1, 0, 0]"""
+    def __init__(self):
+        Matrix.__init__(self, 1.0, 1.0)
+    def __setattr__(self, name, value):
+        if name in "ad":
+            self.__dict__[name] = 1.0
+        elif name in "bcef":
+            self.__dict__[name] = 0.0
+        else:
+            self.__dict__[name] = value
+
+    def checkargs(*args):
+        raise NotImplementedError("Identity is readonly")
+
+    preRotate    = checkargs
+    preShear     = checkargs
+    preScale     = checkargs
+    preTranslate = checkargs
+    concat       = checkargs
+    invert       = checkargs
+
+    def __repr__(self):
+        return "IdentityMatrix(1.0, 0.0, 0.0, 1.0, 0.0, 0.0)"
+
+    def __hash__(self):
+        return hash((1,0,0,1,0,0))
+
+
+Identity = IdentityMatrix()
+
+class Point(object):
+    """Point() - all zeros\nPoint(x, y)\nPoint(Point) - new copy\nPoint(sequence) - from 'sequence'"""
+    def __init__(self, *args):
+        if not args:
+            self.x = 0.0
+            self.y = 0.0
+            return None
+
+        if len(args) > 2:
+            raise ValueError("bad sequ. length")
+        if len(args) == 2:
+            self.x = float(args[0])
+            self.y = float(args[1])
+            return None
+        if len(args) == 1:
+            l = args[0]
+            if hasattr(l, "__getitem__") is False:
+                raise ValueError("bad Point constructor")
+            if len(l) != 2:
+                raise ValueError("bad sequ. length")
+            self.x = float(l[0])
+            self.y = float(l[1])
+            return None
+        raise ValueError("bad Point constructor")
+
+    def transform(self, m):
+        """Replace point by its transformation with matrix-like m."""
+        if len(m) != 6:
+            raise ValueError("bad sequ. length")
+        self.x, self.y = TOOLS._transform_point(self, m)
+        return self
+
+    @property
+    def unit(self):
+        """Unit vector of the point."""
+        s = self.x * self.x + self.y * self.y
+        if s < EPSILON:
+            return Point(0,0)
+        s = math.sqrt(s)
+        return Point(self.x / s, self.y / s)
+
+    @property
+    def abs_unit(self):
+        """Unit vector with positive coordinates."""
+        s = self.x * self.x + self.y * self.y
+        if s < EPSILON:
+            return Point(0,0)
+        s = math.sqrt(s)
+        return Point(abs(self.x) / s, abs(self.y) / s)
+
+    def distance_to(self, *args):
+        """Return distance to rectangle or another point."""
+        if not len(args) > 0:
+            raise ValueError("at least one parameter must be given")
+
+        x = args[0]
+        if len(x) == 2:
+            x = Point(x)
+        elif len(x) == 4:
+            x = Rect(x)
+        else:
+            raise ValueError("arg1 must be point-like or rect-like")
+
+        if len(args) > 1:
+            unit = args[1]
+        else:
+            unit = "px"
+        u = {"px": (1.,1.), "in": (1.,72.), "cm": (2.54, 72.),
+             "mm": (25.4, 72.)}
+        f = u[unit][0] / u[unit][1]
+
+        if type(x) is Point:
+            return abs(self - x) * f
+
+        # from here on, x is a rectangle
+        # as a safeguard, make a finite copy of it
+        r = Rect(x.top_left, x.top_left)
+        r = r | x.bottom_right
+        if self in r:
+            return 0.0
+        if self.x > r.x1:
+            if self.y >= r.y1:
+                return self.distance_to(r.bottom_right, unit)
+            elif self.y <= r.y0:
+                return self.distance_to(r.top_right, unit)
+            else:
+                return (self.x - r.x1) * f
+        elif r.x0 <= self.x <= r.x1:
+            if self.y >= r.y1:
+                return (self.y - r.y1) * f
+            else:
+                return (r.y0 - self.y) * f
+        else:
+            if self.y >= r.y1:
+                return self.distance_to(r.bottom_left, unit)
+            elif self.y <= r.y0:
+                return self.distance_to(r.top_left, unit)
+            else:
+                return (r.x0 - self.x) * f
+
+    def __getitem__(self, i):
+        return (self.x, self.y)[i]
+
+    def __len__(self):
+        return 2
+
+    def __setitem__(self, i, v):
+        v = float(v)
+        if   i == 0: self.x = v
+        elif i == 1: self.y = v
+        else:
+            raise IndexError("index out of range")
+        return None
+
+    def __repr__(self):
+        return "Point" + str(tuple(self))
+
+    def __pos__(self):
+        return Point(self)
+
+    def __neg__(self):
+        return Point(-self.x, -self.y)
+
+    def __bool__(self):
+        return not (max(self) == min(self) == 0)
+
+    def __nonzero__(self):
+        return not (max(self) == min(self) == 0)
+
+    def __eq__(self, p):
+        if not hasattr(p, "__len__"):
+            return False
+        return len(p) == 2 and bool(self - p) is False
+
+    def __abs__(self):
+        return math.sqrt(self.x * self.x + self.y * self.y)
+
+    norm = __abs__
+
+    def __add__(self, p):
+        if hasattr(p, "__float__"):
+            return Point(self.x + p, self.y + p)
+        if len(p) != 2:
+            raise ValueError("bad sequ. length")
+        return Point(self.x + p[0], self.y + p[1])
+
+    def __sub__(self, p):
+        if hasattr(p, "__float__"):
+            return Point(self.x - p, self.y - p)
+        if len(p) != 2:
+            raise ValueError("bad sequ. length")
+        return Point(self.x - p[0], self.y - p[1])
+
+    def __mul__(self, m):
+        if hasattr(m, "__float__"):
+            return Point(self.x * m, self.y * m)
+        p = Point(self)
+        return p.transform(m)
+
+    def __truediv__(self, m):
+        if hasattr(m, "__float__"):
+            return Point(self.x * 1./m, self.y * 1./m)
+        m1 = TOOLS._invert_matrix(m)[1]
+        if not m1:
+            raise ZeroDivisionError("matrix not invertible")
+        p = Point(self)
+        return p.transform(m1)
+
+    __div__ = __truediv__
+
+    def __hash__(self):
+        return hash(tuple(self))
+
+class Rect(object):
+    """Rect() - all zeros\nRect(x0, y0, x1, y1)\nRect(top-left, x1, y1)\nRect(x0, y0, bottom-right)\nRect(top-left, bottom-right)\nRect(Rect or IRect) - new copy\nRect(sequence) - from 'sequence'"""
+    def __init__(self, *args):
+        if not args:
+            self.x0 = self.y0 = self.x1 = self.y1 = 0.0
+            return None
+
+        if len(args) > 4:
+            raise ValueError("bad sequ. length")
+        if len(args) == 4:
+            self.x0, self.y0, self.x1, self.y1 = map(float, args)
+            return None
+        if len(args) == 1:
+            l = args[0]
+            if hasattr(l, "__getitem__") is False:
+                raise ValueError("bad Rect constructor")
+            if len(l) != 4:
+                raise ValueError("bad sequ. length")
+            self.x0, self.y0, self.x1, self.y1 = map(float, l)
+            return None
+        if len(args) == 2:                  # 2 Points provided
+            self.x0 = float(args[0][0])
+            self.y0 = float(args[0][1])
+            self.x1 = float(args[1][0])
+            self.y1 = float(args[1][1])
+            return None
+        if len(args) == 3:                  # 2 floats and 1 Point provided
+            a0 = args[0]
+            a1 = args[1]
+            a2 = args[2]
+            if hasattr(a0, "__float__"):    # (float, float, Point) provided
+                self.x0 = float(a0)
+                self.y0 = float(a1)
+                self.x1 = float(a2[0])
+                self.y1 = float(a2[1])
+                return None
+            self.x0 = float(a0[0])          # (Point, float, float) provided
+            self.y0 = float(a0[1])
+            self.x1 = float(a1)
+            self.y1 = float(a2)
+            return None
+        raise ValueError("bad Rect constructor")
+
+    def normalize(self):
+        """Replace rectangle with its finite version."""
+        if self.x1 < self.x0:
+            self.x0, self.x1 = self.x1, self.x0
+        if self.y1 < self.y0:
+            self.y0, self.y1 = self.y1, self.y0
+        return self
+
+    @property
+    def isEmpty(self):
+        """True if rectangle area is empty."""
+        return self.x0 == self.x1 or self.y0 == self.y1
+
+    @property
+    def isInfinite(self):
+        """True if rectangle is infinite."""
+        return self.x0 > self.x1 or self.y0 > self.y1
+
+    @property
+    def top_left(self):
+        """Top-left corner."""
+        return Point(self.x0, self.y0)
+
+    @property
+    def top_right(self):
+        """Top-right corner."""
+        return Point(self.x1, self.y0)
+
+    @property
+    def bottom_left(self):
+        """Bottom-left corner."""
+        return Point(self.x0, self.y1)
+
+    @property
+    def bottom_right(self):
+        """Bottom-right corner."""
+        return Point(self.x1, self.y1)
+
+    tl = top_left
+    tr = top_right
+    bl = bottom_left
+    br = bottom_right
+
+    @property
+    def quad(self):
+        """Return Quad version of rectangle."""
+        return Quad(self.tl, self.tr, self.bl, self.br)
+
+    def morph(self, p, m):
+        """Morph with matrix-like m and point-like p.
+
+        Returns a new quad."""
+        return self.quad.morph(p, m)
+
+    def round(self):
+        """Return the IRect."""
+        return IRect(min(self.x0, self.x1), min(self.y0, self.y1),
+                     max(self.x0, self.x1), max(self.y0, self.y1))
+
+    irect = property(round)
+
+    width  = property(lambda self: abs(self.x1 - self.x0))
+    height = property(lambda self: abs(self.y1 - self.y0))
+
+    def includePoint(self, p):
+        """Extend to include point-like p."""
+        if not len(p) == 2:
+            raise ValueError("bad sequ. length")
+        self.x0, self.y0, self.x1, self.y1 = TOOLS._include_point_in_rect(self, p)
+        return self
+
+    def includeRect(self, r):
+        """Extend to include rect-like r."""
+        if not len(r) == 4:
+            raise ValueError("bad sequ. length")
+        self.x0, self.y0, self.x1, self.y1 = TOOLS._union_rect(self, r)
+        return self
+
+    def intersect(self, r):
+        """Restrict to common rect with rect-like r."""
+        if not len(r) == 4:
+            raise ValueError("bad sequ. length")
+        self.x0, self.y0, self.x1, self.y1 = TOOLS._intersect_rect(self, r)
+        return self
+
+    def contains(self, x):
+        """Check if containing point-like or rect-like x."""
+        return self.__contains__(x)
+
+    def transform(self, m):
+        """Replace with the transformation by matrix-like m."""
+        if not len(m) == 6:
+            raise ValueError("bad sequ. length")
+        self.x0, self.y0, self.x1, self.y1 = TOOLS._transform_rect(self, m)
+        return self
+
+    def __getitem__(self, i):
+        return (self.x0, self.y0, self.x1, self.y1)[i]
+
+    def __len__(self):
+        return 4
+
+    def __setitem__(self, i, v):
+        v = float(v)
+        if   i == 0: self.x0 = v
+        elif i == 1: self.y0 = v
+        elif i == 2: self.x1 = v
+        elif i == 3: self.y1 = v
+        else:
+            raise IndexError("index out of range")
+        return None
+
+    def __repr__(self):
+        return "Rect" + str(tuple(self))
+
+    def __pos__(self):
+        return Rect(self)
+
+    def __neg__(self):
+        return Rect(-self.x0, -self.y0, -self.x1, -self.y1)
+
+    def __bool__(self):
+        return not (max(self) == min(self) == 0)
+
+    def __nonzero__(self):
+        return not (max(self) == min(self) == 0)
+
+    def __eq__(self, rect):
+        if not hasattr(rect, "__len__"):
+            return False
+        return len(rect) == 4 and bool(self - rect) is False
+
+    def __abs__(self):
+        if self.isEmpty or self.isInfinite:
+            return 0.0
+        return (self.x1 - self.x0) * (self.y1 - self.y0)
+
+    def norm(self):
+        return math.sqrt(sum([c*c for c in self]))
+
+    def __add__(self, p):
+        if hasattr(p, "__float__"):
+            r = Rect(self.x0 + p, self.y0 + p, self.x1 + p, self.y1 + p)
+        else:
+            if len(p) != 4:
+                raise ValueError("bad sequ. length")
+            r = Rect(self.x0 + p[0], self.y0 + p[1], self.x1 + p[2], self.y1 + p[3])
+        return r
+
+    def __sub__(self, p):
+        if hasattr(p, "__float__"):
+            return Rect(self.x0 - p, self.y0 - p, self.x1 - p, self.y1 - p)
+        if len(p) != 4:
+            raise ValueError("bad sequ. length")
+        return Rect(self.x0 - p[0], self.y0 - p[1], self.x1 - p[2], self.y1 - p[3])
+
+    def __mul__(self, m):
+        if hasattr(m, "__float__"):
+            return Rect(self.x0 * m, self.y0 * m, self.x1 * m, self.y1 * m)
+        r = Rect(self)
+        r = r.transform(m)
+        return r
+
+    def __truediv__(self, m):
+        if hasattr(m, "__float__"):
+            return Rect(self.x0 * 1./m, self.y0 * 1./m, self.x1 * 1./m, self.y1 * 1./m)
+        im = TOOLS._invert_matrix(m)[1]
+        if not im:
+            raise ZeroDivisionError("matrix not invertible")
+        r = Rect(self)
+        r = r.transform(im)
+        return r
+
+    __div__ = __truediv__
+
+    def __contains__(self, x):
+        if hasattr(x, "__float__"):
+            return x in tuple(self)
+        l = len(x)
+        r = Rect(self).normalize()
+        if l == 4:
+            if r.isEmpty: return False
+            xr = Rect(x).normalize()
+            if xr.isEmpty: return True
+            if r.x0 <= xr.x0 and r.y0 <= xr.y0 and \
+               r.x1 >= xr.x1 and r.y1 >= xr.y1:
+               return True
+            return False
+        if l == 2:
+            if r.x0 <= x[0] <= r.x1 and \
+               r.y0 <= x[1] <= r.y1:
+               return True
+            return False
+        return False
+
+    def __or__(self, x):
+        if not hasattr(x, "__len__"):
+            raise ValueError("bad operand 2")
+
+        r = Rect(self)
+        if len(x) == 2:
+            return r.includePoint(x)
+        if len(x) == 4:
+            return r.includeRect(x)
+        raise ValueError("bad operand 2")
+
+    def __and__(self, x):
+        if not hasattr(x, "__len__"):
+            raise ValueError("bad operand 2")
+
+        r1 = Rect(x)
+        r = Rect(self)
+        return r.intersect(r1)
+
+    def intersects(self, x):
+        """Check if intersection with rectangle x is not empty."""
+        r1 = Rect(x)
+        if self.isEmpty or self.isInfinite or r1.isEmpty or r1.isInfinite:
+            return False
+        r = Rect(self)
+        if r.intersect(r1).isEmpty:
+            return False
+        return True
+
+    def __hash__(self):
+        return hash(tuple(self))
+
+class IRect(Rect):
+    """IRect() - all zeros\nIRect(x0, y0, x1, y1)\nIRect(Rect or IRect) - new copy\nIRect(sequence) - from 'sequence'"""
+    def __init__(self, *args):
+        Rect.__init__(self, *args)
+        self.x0 = math.floor(self.x0 + 0.001)
+        self.y0 = math.floor(self.y0 + 0.001)
+        self.x1 = math.ceil(self.x1 - 0.001)
+        self.y1 = math.ceil(self.y1 - 0.001)
+        return None
+
+    @property
+    def round(self):
+        pass
+
+    irect = round
+
+    @property
+    def rect(self):
+        return Rect(self)
+
+    def __repr__(self):
+        return "IRect" + str(tuple(self))
+
+    def includePoint(self, p):
+        """Extend rectangle to include point p."""
+        return Rect.includePoint(self, p).round()
+
+    def includeRect(self, r):
+        """Extend rectangle to include rectangle r."""
+        return Rect.includeRect(self, r).round()
+
+    def intersect(self, r):
+        """Restrict rectangle to intersection with rectangle r."""
+        return Rect.intersect(self, r).round()
+
+    def __setitem__(self, i, v):
+        v = int(v)
+        if   i == 0: self.x0 = v
+        elif i == 1: self.y0 = v
+        elif i == 2: self.x1 = v
+        elif i == 3: self.y1 = v
+        else:
+            raise IndexError("index out of range")
+        return None
+
+    def __pos__(self):
+        return IRect(self)
+
+    def __neg__(self):
+        return IRect(-self.x0, -self.y0, -self.x1, -self.y1)
+
+    def __add__(self, p):
+        return Rect.__add__(self, p).round()
+
+    def __sub__(self, p):
+        return Rect.__sub__(self, p).round()
+
+    def transform(self, m):
+        return Rect.transform(self, m).round()
+
+    def __mul__(self, m):
+        return Rect.__mul__(self, m).round()
+
+    def __truediv__(self, m):
+        return Rect.__truediv__(self, m).round()
+
+    def __or__(self, x):
+        return Rect.__or__(self, x).round()
+
+    def __and__(self, x):
+        return Rect.__and__(self, x).round()
+
+class Quad(object):
+    """Quad() - all zero points\nQuad(ul, ur, ll, lr)\nQuad(quad) - new copy\nQuad(sequence) - from 'sequence'"""
+    def __init__(self, *args):
+        if not args:
+            self.ul = self.ur = self.ll = self.lr = Point()
+            return None
+
+        if len(args) > 4:
+            raise ValueError("bad sequ. length")
+        if len(args) == 4:
+            self.ul, self.ur, self.ll, self.lr = map(Point, args)
+            return None
+        if len(args) == 1:
+            l = args[0]
+            if hasattr(l, "__getitem__") is False:
+                raise ValueError("bad Quad constructor")
+            if len(l) != 4:
+                raise ValueError("bad sequ. length")
+            self.ul, self.ur, self.ll, self.lr = map(Point, l)
+            return None
+        raise ValueError("bad Quad constructor")
+
+    @property
+    def isRectangular(self):
+        """Check if quad is rectangular.
+
+        Notes:
+            Some rotation matrix can thus transform it into a rectangle.
+            This is equivalent to three corners enclose 90 degrees.
+        Returns:
+            True or False.
+        """
+
+        sine = TOOLS._sine_between(self.ul, self.ur, self.lr)
+        if abs(sine - 1) > EPSILON:  # the sine of the angle
+            return False
+
+        sine = TOOLS._sine_between(self.ur, self.lr, self.ll)
+        if abs(sine - 1) > EPSILON:
+            return False
+
+        sine = TOOLS._sine_between(self.lr, self.ll, self.ul)
+        if abs(sine - 1) > EPSILON:
+            return False
+
+        return True
+
+
+    @property
+    def isConvex(self):
+        """Check if quad is convex and not degenerate.
+
+        Notes:
+            For convexity, every line connecting two points of the quad must be
+            inside the quad. This is equivalent to that every corner encloses
+            an angle with 0 < angle < 180 degrees.
+            Excluding the "degenerate" case (all points on the same line),
+            it suffices to check that the sines of three angles are > 0.
+        Returns:
+            True or False.
+        """
+        count = 0
+        sine = TOOLS._sine_between(self.ul, self.ur, self.lr)
+        if sine > 0:
+            count += 1
+        elif sine < 0:
+            return False
+
+        sine = TOOLS._sine_between(self.ur, self.lr, self.ll)
+        if sine > 0:
+            count += 1
+        elif sine < 0:
+            return False
+
+        sine = TOOLS._sine_between(self.lr, self.ll, self.ul)
+        if sine > 0:
+            count += 1
+        elif sine < 0:
+            return False
+
+        sine = TOOLS._sine_between(self.ll, self.ul, self.ur)
+        if sine > 0:
+            count += 1
+        elif sine < 0:
+            return False
+
+        if count >= 2:
+            return True
+
+        return False
+
+
+    @property
+    def isEmpty(self):
+        """Check whether all quad corners are on the same line.
+
+        The is the case exactly if more than one corner angle is zero.
+        """
+        count = 0
+        if abs(TOOLS._sine_between(self.ul, self.ur, self.lr)) < EPSILON:
+            count += 1
+        if abs(TOOLS._sine_between(self.ur, self.lr, self.ll)) < EPSILON:
+            count += 1
+        if abs(TOOLS._sine_between(self.lr, self.ll, self.ul)) < EPSILON:
+            count += 1
+        if abs(TOOLS._sine_between(self.ll, self.ul, self.ur)) < EPSILON:
+            count += 1
+        if count <= 2:
+            return False
+        return True
+
+    width  = property(lambda self: max(abs(self.ul - self.ur), abs(self.ll - self.lr)))
+    height = property(lambda self: max(abs(self.ul - self.ll), abs(self.ur - self.lr)))
+
+    @property
+    def rect(self):
+        r = Rect()
+        r.x0 = min(self.ul.x, self.ur.x, self.lr.x, self.ll.x)
+        r.y0 = min(self.ul.y, self.ur.y, self.lr.y, self.ll.y)
+        r.x1 = max(self.ul.x, self.ur.x, self.lr.x, self.ll.x)
+        r.y1 = max(self.ul.y, self.ur.y, self.lr.y, self.ll.y)
+        return r
+
+    def __getitem__(self, i):
+        return (self.ul, self.ur, self.ll, self.lr)[i]
+
+    def __len__(self):
+        return 4
+
+    def __setitem__(self, i, v):
+        if   i == 0: self.ul = Point(v)
+        elif i == 1: self.ur = Point(v)
+        elif i == 2: self.ll = Point(v)
+        elif i == 3: self.lr = Point(v)
+        else:
+            raise IndexError("index out of range")
+        return None
+
+    def __repr__(self):
+        return "Quad" + str(tuple(self))
+
+    def __pos__(self):
+        return Quad(self)
+
+    def __neg__(self):
+        return Quad(-self.ul, -self.ur, -self.ll, -self.lr)
+
+    def __bool__(self):
+        return not self.isEmpty
+
+    def __nonzero__(self):
+        return not self.isEmpty
+
+    def __eq__(self, quad):
+        if not hasattr(quad, "__len__"):
+            return False
+        return len(quad) == 4 and (
+            self.ul == quad[0] and
+            self.ur == quad[1] and
+            self.ll == quad[2] and
+            self.lr == quad[3]
+        )
+
+    def __abs__(self):
+        if self.isEmpty:
+            return 0.0
+        return abs(self.ul - self.ur) * abs(self.ul - self.ll)
+
+
+    def morph(self, p, m):
+        """Morph the quad with matrix-like 'm' and point-like 'p'.
+
+        Return a new quad."""
+
+        delta = Matrix(1, 1).preTranslate(p.x, p.y)
+        q = self * ~delta * m * delta
+        return q
+
+
+    def transform(self, m):
+        """Replace quad by its transformation with matrix m."""
+        if len(m) != 6:
+            raise ValueError("bad sequ. length")
+        self.ul *= m
+        self.ur *= m
+        self.ll *= m
+        self.lr *= m
+        return self
+
+    def __mul__(self, m):
+        r = Quad(self)
+        r = r.transform(m)
+        return r
+
+    def __truediv__(self, m):
+        if hasattr(m, "__float__"):
+            im = 1. / m
+        else:
+            im = TOOLS._invert_matrix(m)[1]
+            if not im:
+                raise ZeroDivisionError("matrix not invertible")
+        r = Quad(self)
+        r = r.transform(im)
+        return r
+
+    __div__ = __truediv__
+
+    def __hash__(self):
+        return hash(tuple(self))
+
+%}
diff --git a/fitz/helper-other.i b/fitz/helper-other.i

new file mode 100644 (file)

index 0000000..e04bb85
--- /dev/null
+++ b/fitz/helper-other.i
@@ -0,0 +1,868 @@
+%{
+
+int LIST_APPEND_DROP(PyObject *list, PyObject *item)
+{
+    if (!list || !PyList_Check(list) || !item) return -2;
+    int rc = PyList_Append(list, item);
+    Py_DECREF(item);
+    return rc;
+}
+
+int DICT_SETITEM_DROP(PyObject *dict, PyObject *key, PyObject *value)
+{
+    if (!dict || !PyDict_Check(dict) || !key || !value) return -2;
+    int rc = PyDict_SetItem(dict, key, value);
+    Py_DECREF(value);
+    return rc;
+}
+
+int DICT_SETITEMSTR_DROP(PyObject *dict, const char *key, PyObject *value)
+{
+    if (!dict || !PyDict_Check(dict) || !key || !value) return -2;
+    int rc = PyDict_SetItemString(dict, key, value);
+    Py_DECREF(value);
+    return rc;
+}
+
+
+PyObject *JM_EscapeStrFromBuffer(fz_context *ctx, fz_buffer *buff)
+{
+    if (!buff) return PyUnicode_FromString("");
+    unsigned char *s = NULL;
+    size_t len = fz_buffer_storage(ctx, buff, &s);
+    PyObject *val = PyUnicode_DecodeRawUnicodeEscape((const char *) s, (Py_ssize_t) len, "replace");
+    if (!val) {
+        val = PyUnicode_FromString("");
+        PyErr_Clear();
+    }
+    return val;
+}
+
+PyObject *JM_UnicodeFromStr(const char *c)
+{
+    if (!c) return PyUnicode_FromString("");
+    PyObject *val = Py_BuildValue("s", c);
+    if (!val) {
+        val = PyUnicode_FromString("");
+        PyErr_Clear();
+    }
+    return val;
+}
+
+PyObject *JM_EscapeStrFromStr(const char *c)
+{
+    if (!c) return PyUnicode_FromString("");
+    PyObject *val = PyUnicode_DecodeRawUnicodeEscape(c, (Py_ssize_t) strlen(c), "replace");
+    if (!val) {
+        val = PyUnicode_FromString("");
+        PyErr_Clear();
+    }
+    return val;
+}
+
+// redirect MuPDF warnings
+void JM_mupdf_warning(void *user, const char *message)
+{
+    LIST_APPEND_DROP(JM_mupdf_warnings_store, JM_EscapeStrFromStr(message));
+}
+
+// redirect MuPDF errors
+void JM_mupdf_error(void *user, const char *message)
+{
+    LIST_APPEND_DROP(JM_mupdf_warnings_store, JM_EscapeStrFromStr(message));
+    if (JM_mupdf_show_errors == Py_True)
+        PySys_WriteStderr("mupdf: %s\n", message);
+}
+
+// a simple tracer
+void JM_TRACE(const char *id)
+{
+    PySys_WriteStdout("%s\n", id);
+}
+
+
+// put a warning on Python-stdout
+void JM_Warning(const char *id)
+{
+    PySys_WriteStdout("warning: %s\n", id);
+}
+
+#if JM_MEMORY == 1
+//-----------------------------------------------------------------------------
+// The following 3 functions replace MuPDF standard memory allocation.
+// This will ensure, that MuPDF memory handling becomes part of Python's
+// memory management.
+//-----------------------------------------------------------------------------
+static void *JM_Py_Malloc(void *opaque, size_t size)
+{
+    return PyMem_Malloc(size);
+}
+
+static void *JM_Py_Realloc(void *opaque, void *old, size_t size)
+{
+    return PyMem_Realloc(old, size);
+}
+
+static void JM_PY_Free(void *opaque, void *ptr)
+{
+    PyMem_Free(ptr);
+}
+
+const fz_alloc_context JM_Alloc_Context =
+{
+       NULL,
+       JM_Py_Malloc,
+       JM_Py_Realloc,
+       JM_PY_Free
+};
+#endif
+
+// return Python bools for a given integer
+PyObject *JM_BOOL(int v)
+{
+    if (v == 0)
+        Py_RETURN_FALSE;
+    Py_RETURN_TRUE;
+}
+
+PyObject *JM_fitz_config()
+{
+#if defined(TOFU)
+#define have_TOFU JM_BOOL(0)
+#else
+#define have_TOFU JM_BOOL(1)
+#endif
+#if defined(TOFU_CJK)
+#define have_TOFU_CJK JM_BOOL(0)
+#else
+#define have_TOFU_CJK JM_BOOL(1)
+#endif
+#if defined(TOFU_CJK_EXT)
+#define have_TOFU_CJK_EXT JM_BOOL(0)
+#else
+#define have_TOFU_CJK_EXT JM_BOOL(1)
+#endif
+#if defined(TOFU_CJK_LANG)
+#define have_TOFU_CJK_LANG JM_BOOL(0)
+#else
+#define have_TOFU_CJK_LANG JM_BOOL(1)
+#endif
+#if defined(TOFU_EMOJI)
+#define have_TOFU_EMOJI JM_BOOL(0)
+#else
+#define have_TOFU_EMOJI JM_BOOL(1)
+#endif
+#if defined(TOFU_HISTORIC)
+#define have_TOFU_HISTORIC JM_BOOL(0)
+#else
+#define have_TOFU_HISTORIC JM_BOOL(1)
+#endif
+#if defined(TOFU_SYMBOL)
+#define have_TOFU_SYMBOL JM_BOOL(0)
+#else
+#define have_TOFU_SYMBOL JM_BOOL(1)
+#endif
+#if defined(TOFU_SIL)
+#define have_TOFU_SIL JM_BOOL(0)
+#else
+#define have_TOFU_SIL JM_BOOL(1)
+#endif
+#if defined(TOFU_BASE14)
+#define have_TOFU_BASE14 JM_BOOL(0)
+#else
+#define have_TOFU_BASE14 JM_BOOL(1)
+#endif
+    PyObject *dict = PyDict_New();
+    DICT_SETITEMSTR_DROP(dict, "plotter-g", JM_BOOL(FZ_PLOTTERS_G));
+    DICT_SETITEMSTR_DROP(dict, "plotter-rgb", JM_BOOL(FZ_PLOTTERS_RGB));
+    DICT_SETITEMSTR_DROP(dict, "plotter-cmyk", JM_BOOL(FZ_PLOTTERS_CMYK));
+    DICT_SETITEMSTR_DROP(dict, "plotter-n", JM_BOOL(FZ_PLOTTERS_N));
+    DICT_SETITEMSTR_DROP(dict, "pdf", JM_BOOL(FZ_ENABLE_PDF));
+    DICT_SETITEMSTR_DROP(dict, "xps", JM_BOOL(FZ_ENABLE_XPS));
+    DICT_SETITEMSTR_DROP(dict, "svg", JM_BOOL(FZ_ENABLE_SVG));
+    DICT_SETITEMSTR_DROP(dict, "cbz", JM_BOOL(FZ_ENABLE_CBZ));
+    DICT_SETITEMSTR_DROP(dict, "img", JM_BOOL(FZ_ENABLE_IMG));
+    DICT_SETITEMSTR_DROP(dict, "html", JM_BOOL(FZ_ENABLE_HTML));
+    DICT_SETITEMSTR_DROP(dict, "epub", JM_BOOL(FZ_ENABLE_EPUB));
+    DICT_SETITEMSTR_DROP(dict, "jpx", JM_BOOL(FZ_ENABLE_JPX));
+    DICT_SETITEMSTR_DROP(dict, "js", JM_BOOL(FZ_ENABLE_JS));
+    DICT_SETITEMSTR_DROP(dict, "tofu", have_TOFU);
+    DICT_SETITEMSTR_DROP(dict, "tofu-cjk", have_TOFU_CJK);
+    DICT_SETITEMSTR_DROP(dict, "tofu-cjk-ext", have_TOFU_CJK_EXT);
+    DICT_SETITEMSTR_DROP(dict, "tofu-cjk-lang", have_TOFU_CJK_LANG);
+    DICT_SETITEMSTR_DROP(dict, "tofu-emoji", have_TOFU_EMOJI);
+    DICT_SETITEMSTR_DROP(dict, "tofu-historic", have_TOFU_HISTORIC);
+    DICT_SETITEMSTR_DROP(dict, "tofu-symbol", have_TOFU_SYMBOL);
+    DICT_SETITEMSTR_DROP(dict, "tofu-sil", have_TOFU_SIL);
+    DICT_SETITEMSTR_DROP(dict, "icc", JM_BOOL(FZ_ENABLE_ICC));
+    DICT_SETITEMSTR_DROP(dict, "base14", have_TOFU_BASE14);
+    DICT_SETITEMSTR_DROP(dict, "py-memory", JM_BOOL(JM_MEMORY));
+    return dict;
+}
+
+//----------------------------------------------------------------------------
+// Update a color float array with values from a Python sequence.
+// Any error condition is treated as a no-op.
+//----------------------------------------------------------------------------
+void JM_color_FromSequence(PyObject *color, int *n, float col[4])
+{
+    if (!color || (!PySequence_Check(color) && !PyFloat_Check(color))) {
+        *n = 1;
+        return;
+    }
+    if (PyFloat_Check(color)) { // maybe just a single float
+        float c = (float) PyFloat_AsDouble(color);
+        if (!INRANGE(c, 0, 1)) {
+            *n = 1;
+            return;
+        }
+        col[0] = c;
+        *n = 1;
+        return;
+    }
+
+    int len = (int) PySequence_Size(color), rc;
+    if (!INRANGE(len, 1, 4) || len == 2) {
+        *n = 1;
+        return;
+    }
+
+    float mcol[4] = {0,0,0,0}; // local color storage
+    Py_ssize_t i;
+    for (i = 0; i < len; i++) {
+        rc = JM_FLOAT_ITEM(color, i, &mcol[i]);
+        if (!INRANGE(mcol[i], 0, 1) || rc == 1) mcol[i] = 1;
+    }
+
+    *n = len;
+    for (i = 0; i < len; i++)
+        col[i] = mcol[i];
+    return;
+}
+
+// return extension for fitz image type
+const char *JM_image_extension(int type)
+{
+    switch (type) {
+        case(FZ_IMAGE_RAW): return "raw";
+        case(FZ_IMAGE_FLATE): return "flate";
+        case(FZ_IMAGE_LZW): return "lzw";
+        case(FZ_IMAGE_RLD): return "rld";
+        case(FZ_IMAGE_BMP): return "bmp";
+        case(FZ_IMAGE_GIF): return "gif";
+        case(FZ_IMAGE_JBIG2): return "jbig2";
+        case(FZ_IMAGE_JPEG): return "jpeg";
+        case(FZ_IMAGE_JPX): return "jpx";
+        case(FZ_IMAGE_JXR): return "jxr";
+        case(FZ_IMAGE_PNG): return "png";
+        case(FZ_IMAGE_PNM): return "pnm";
+        case(FZ_IMAGE_TIFF): return "tiff";
+        default: return "n/a";
+    }
+}
+
+//----------------------------------------------------------------------------
+// Turn fz_buffer into a Python bytes object
+//----------------------------------------------------------------------------
+PyObject *JM_BinFromBuffer(fz_context *ctx, fz_buffer *buffer)
+{
+
+#if  PY_VERSION_HEX < 0x03000000
+ #define PyBytes_FromString(x) PyString_FromString(x)
+ #define PyBytes_FromStringAndSize(c, l) PyString_FromStringAndSize(c, l)
+#endif
+
+    if (!buffer) {
+        return PyBytes_FromString("");
+    }
+    unsigned char *c = NULL;
+    size_t len = fz_buffer_storage(ctx, buffer, &c);
+    return PyBytes_FromStringAndSize((const char *) c, (Py_ssize_t) len);
+}
+
+//----------------------------------------------------------------------------
+// Turn fz_buffer into a Python bytearray object
+//----------------------------------------------------------------------------
+PyObject *JM_BArrayFromBuffer(fz_context *ctx, fz_buffer *buffer)
+{
+    if (!buffer) {
+        return PyByteArray_FromStringAndSize("", 0);
+    }
+    unsigned char *c = NULL;
+    size_t len = fz_buffer_storage(ctx, buffer, &c);
+    return PyByteArray_FromStringAndSize((const char *) c, (Py_ssize_t) len);
+}
+
+
+//----------------------------------------------------------------------------
+// compress char* into a new buffer
+//----------------------------------------------------------------------------
+fz_buffer *JM_compress_buffer(fz_context *ctx, fz_buffer *inbuffer)
+{
+    fz_buffer *buf = NULL;
+    fz_try(ctx) {
+        size_t compressed_length = 0;
+        unsigned char *data = fz_new_deflated_data_from_buffer(ctx,
+                              &compressed_length, inbuffer, FZ_DEFLATE_BEST);
+        if (data == NULL || compressed_length == 0)
+            return NULL;
+        buf = fz_new_buffer_from_data(ctx, data, compressed_length);
+        fz_resize_buffer(ctx, buf, compressed_length);
+    }
+    fz_catch(ctx) {
+        fz_drop_buffer(ctx, buf);
+        fz_rethrow(ctx);
+    }
+    return buf;
+}
+
+//----------------------------------------------------------------------------
+// update a stream object
+// compress stream when beneficial
+//----------------------------------------------------------------------------
+void JM_update_stream(fz_context *ctx, pdf_document *doc, pdf_obj *obj, fz_buffer *buffer, int compress)
+{
+    
+    fz_buffer *nres = NULL;
+    size_t len = fz_buffer_storage(ctx, buffer, NULL);
+    size_t nlen = len;
+
+    if (len > 30) {  // ignore small stuff
+        nres = JM_compress_buffer(ctx, buffer);
+        nlen = fz_buffer_storage(ctx, nres, NULL);
+    }
+
+    if (nlen < len && nres && compress==1) {  // was it worth the effort?
+        pdf_dict_put(ctx, obj, PDF_NAME(Filter), PDF_NAME(FlateDecode));
+        pdf_update_stream(ctx, doc, obj, nres, 1);
+    } else {
+        pdf_update_stream(ctx, doc, obj, buffer, 0);
+    }
+    fz_drop_buffer(ctx, nres);
+}
+
+//-----------------------------------------------------------------------------
+// return hex characters for n characters in input 'in'
+//-----------------------------------------------------------------------------
+void hexlify(int n, unsigned char *in, unsigned char *out)
+{
+    const unsigned char hdigit[17] = "0123456789abcedf";
+    int i, i1, i2;
+    for (i = 0; i < n; i++) {
+        i1 = in[i]>>4;
+        i2 = in[i] - i1*16;
+        out[2*i] = hdigit[i1];
+        out[2*i + 1] = hdigit[i2];
+    }
+    out[2*n] = 0;
+}
+
+//----------------------------------------------------------------------------
+// Make fz_buffer from a PyBytes, PyByteArray, io.BytesIO object
+//----------------------------------------------------------------------------
+fz_buffer *JM_BufferFromBytes(fz_context *ctx, PyObject *stream)
+{
+    if (!EXISTS(stream)) return NULL;
+    char *c = NULL;
+    PyObject *mybytes = NULL;
+    size_t len = 0;
+    fz_buffer *res = NULL;
+    fz_var(res);
+    fz_try(ctx) {
+        if (PyBytes_Check(stream)) {
+            c = PyBytes_AS_STRING(stream);
+            len = (size_t) PyBytes_GET_SIZE(stream);
+        } else if (PyByteArray_Check(stream)) {
+            c = PyByteArray_AS_STRING(stream);
+            len = (size_t) PyByteArray_GET_SIZE(stream);
+        } else if (PyObject_HasAttrString(stream, "getvalue")) {
+            // we assume here that this delivers what we expect
+            mybytes = PyObject_CallMethod(stream, "getvalue", NULL);
+            c = PyBytes_AS_STRING(mybytes);
+            len = (size_t) PyBytes_GET_SIZE(mybytes);
+        }
+        // all the above leave c as NULL pointer if unsuccessful
+        if (c) res = fz_new_buffer_from_copied_data(ctx, (const unsigned char *) c, len);
+    }
+    fz_always(ctx) {
+        Py_CLEAR(mybytes);
+        PyErr_Clear();
+    }
+    fz_catch(ctx) {
+        fz_drop_buffer(ctx, res);
+        fz_rethrow(ctx);
+    }
+    return res;
+}
+
+//----------------------------------------------------------------------------
+// Modified copy of SWIG_Python_str_AsChar
+// If Py3, the SWIG original v3.0.12 does *not* deliver NULL for a
+// non-string input, as does PyString_AsString in Py2.
+//----------------------------------------------------------------------------
+char *JM_Python_str_AsChar(PyObject *str)
+{
+    if (!str) return NULL;
+#if PY_VERSION_HEX >= 0x03000000
+  char *newstr = NULL;
+  PyObject *xstr = PyUnicode_AsUTF8String(str);
+  if (xstr) {
+    char *cstr;
+    Py_ssize_t len;
+    PyBytes_AsStringAndSize(xstr, &cstr, &len);
+    size_t l = len + 1;
+    newstr = JM_Alloc(char, l);
+    memcpy(newstr, cstr, l);
+    Py_XDECREF(xstr);
+  }
+  return newstr;
+#else
+  return PyString_AsString(str);
+#endif
+}
+
+#if PY_VERSION_HEX >= 0x03000000
+#  define JM_Python_str_DelForPy3(x) JM_Free(x)
+#else
+#  define JM_Python_str_DelForPy3(x)
+#endif
+
+//----------------------------------------------------------------------------
+// Deep-copies a specified source page to the target location.
+// Modified copy of function of pdfmerge.c: we also copy annotations, but
+// we skip **link** annotations. In addition we rotate output.
+//----------------------------------------------------------------------------
+static void
+page_merge(fz_context *ctx, pdf_document *doc_des, pdf_document *doc_src, int page_from, int page_to, int rotate, int links, int copy_annots, pdf_graft_map *graft_map)
+{
+    pdf_obj *page_ref = NULL;
+    pdf_obj *page_dict = NULL;
+    pdf_obj *obj = NULL, *ref = NULL;
+
+    // list of object types (per page) we want to copy
+    pdf_obj *known_page_objs[] = {
+        PDF_NAME(Contents),
+        PDF_NAME(Resources),
+        PDF_NAME(MediaBox),
+        PDF_NAME(CropBox),
+        PDF_NAME(BleedBox),
+        PDF_NAME(TrimBox),
+        PDF_NAME(ArtBox),
+        PDF_NAME(Rotate),
+        PDF_NAME(UserUnit)
+    };
+    int i, n = nelem(known_page_objs);  // number of list elements
+    fz_var(obj);
+    fz_var(ref);
+    fz_var(page_dict);
+    fz_try(ctx) {
+        page_ref = pdf_lookup_page_obj(ctx, doc_src, page_from);
+        pdf_flatten_inheritable_page_items(ctx, page_ref);
+
+        // make a new page
+        page_dict = pdf_new_dict(ctx, doc_des, 4);
+        pdf_dict_put(ctx, page_dict, PDF_NAME(Type), PDF_NAME(Page));
+
+        // copy objects of source page into it
+        for (i = 0; i < n; i++) {
+            obj = pdf_dict_get(ctx, page_ref, known_page_objs[i]);
+            if (obj != NULL)
+                pdf_dict_put_drop(ctx, page_dict, known_page_objs[i], pdf_graft_mapped_object(ctx, graft_map, obj));
+        }
+
+        // Copy the annotations, but skip types Link and Popup.
+        // Also skip IRT annotations ("in response to").
+        // Remove dict keys P (parent) and Popup from copyied annot.
+        if (copy_annots) {
+            pdf_obj *old_annots = pdf_dict_get(ctx, page_ref, PDF_NAME(Annots));
+            if (old_annots) {
+                n = pdf_array_len(ctx, old_annots);
+                pdf_obj *new_annots = pdf_new_array(ctx, doc_des, n);
+                for (i = 0; i < n; i++) {
+                    pdf_obj *o = pdf_array_get(ctx, old_annots, i);
+                    pdf_obj *subtype = pdf_dict_get(ctx, o, PDF_NAME(Subtype));
+                    if (pdf_name_eq(ctx, subtype, PDF_NAME(Link))) continue;
+                    if (pdf_name_eq(ctx, subtype, PDF_NAME(Popup))) continue;
+                    if (pdf_dict_gets(ctx, o, "IRT")) continue;
+                    pdf_obj *copy_o = pdf_graft_mapped_object(ctx, graft_map, o);
+                    pdf_dict_del(gctx, copy_o, PDF_NAME(Popup));
+                    pdf_dict_del(gctx, copy_o, PDF_NAME(P));
+                    pdf_array_push_drop(ctx, new_annots, copy_o);
+                }
+                pdf_dict_put_drop(ctx, page_dict, PDF_NAME(Annots), new_annots);
+            }
+        }
+        // rotate the page as requested
+        if (rotate != -1) {
+            pdf_dict_put_int(ctx, page_dict, PDF_NAME(Rotate), (int64_t) rotate);
+        }
+        // Now add the page dictionary to dest PDF
+        obj = pdf_add_object(ctx, doc_des, page_dict);
+
+        // Get indirect ref of the new page
+        int num = pdf_to_num(ctx, obj);
+        ref = pdf_new_indirect(ctx, doc_des, num, 0);
+
+        // Insert new page at specified location
+        pdf_insert_page(ctx, doc_des, page_to, ref);
+
+    }
+    fz_always(ctx) {
+        pdf_drop_obj(ctx, obj);
+        pdf_drop_obj(ctx, ref);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+}
+
+//-----------------------------------------------------------------------------
+// Copy a range of pages (spage, epage) from a source PDF to a specified
+// location (apage) of the target PDF.
+// If spage > epage, the sequence of source pages is reversed.
+//-----------------------------------------------------------------------------
+void JM_merge_range(fz_context *ctx, pdf_document *doc_des, pdf_document *doc_src, int spage, int epage, int apage, int rotate, int links, int annots)
+{
+    int page, afterpage;
+    pdf_graft_map *graft_map;
+    afterpage = apage;
+    graft_map = pdf_new_graft_map(ctx, doc_des);
+
+    fz_try(ctx) {
+        if (spage < epage) {
+            for (page = spage; page <= epage; page++, afterpage++)
+                page_merge(ctx, doc_des, doc_src, page, afterpage, rotate, links, annots, graft_map);
+        } else {
+            for (page = spage; page >= epage; page--, afterpage++)
+                page_merge(ctx, doc_des, doc_src, page, afterpage, rotate, links, annots, graft_map);
+        }
+    }
+
+    fz_always(ctx) {
+        pdf_drop_graft_map(ctx, graft_map);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+}
+
+//----------------------------------------------------------------------------
+// Return list of outline xref numbers. Recursive function. Arguments:
+// 'obj' first OL item
+// 'xrefs' empty Python list
+//----------------------------------------------------------------------------
+PyObject *JM_outline_xrefs(fz_context *ctx, pdf_obj *obj, PyObject *xrefs)
+{
+    pdf_obj *first, *parent, *thisobj;
+    if (!obj) return xrefs;
+    thisobj = obj;
+    while (thisobj) {
+        LIST_APPEND_DROP(xrefs, Py_BuildValue("i", pdf_to_num(ctx, thisobj)));
+        first = pdf_dict_get(ctx, thisobj, PDF_NAME(First));  // try go down
+        if (first) xrefs = JM_outline_xrefs(ctx, first, xrefs);
+        thisobj = pdf_dict_get(ctx, thisobj, PDF_NAME(Next));  // try go next
+        parent = pdf_dict_get(ctx, thisobj, PDF_NAME(Parent));  // get parent
+        if (!thisobj) thisobj = parent;  // goto parent if no next exists
+    }
+    return xrefs;
+}
+
+//-----------------------------------------------------------------------------
+// Return the contents of a font file, identified by xref
+//-----------------------------------------------------------------------------
+fz_buffer *JM_get_fontbuffer(fz_context *ctx, pdf_document *doc, int xref)
+{
+    if (xref < 1) return NULL;
+    pdf_obj *o, *obj = NULL, *desft, *stream = NULL;
+    o = pdf_load_object(ctx, doc, xref);
+    desft = pdf_dict_get(ctx, o, PDF_NAME(DescendantFonts));
+    char *ext = NULL;
+    if (desft) {
+        obj = pdf_resolve_indirect(ctx, pdf_array_get(ctx, desft, 0));
+        obj = pdf_dict_get(ctx, obj, PDF_NAME(FontDescriptor));
+    } else {
+        obj = pdf_dict_get(ctx, o, PDF_NAME(FontDescriptor));
+    }
+
+    if (!obj) {
+        pdf_drop_obj(ctx, o);
+        PySys_WriteStdout("invalid font - FontDescriptor missing");
+        return NULL;
+    }
+    pdf_drop_obj(ctx, o);
+    o = obj;
+
+    obj = pdf_dict_get(ctx, o, PDF_NAME(FontFile));
+    if (obj) stream = obj;             // ext = "pfa"
+
+    obj = pdf_dict_get(ctx, o, PDF_NAME(FontFile2));
+    if (obj) stream = obj;             // ext = "ttf"
+
+    obj = pdf_dict_get(ctx, o, PDF_NAME(FontFile3));
+    if (obj) {
+        stream = obj;
+
+        obj = pdf_dict_get(ctx, obj, PDF_NAME(Subtype));
+        if (obj && !pdf_is_name(ctx, obj)) {
+            PySys_WriteStdout("invalid font descriptor subtype");
+            return NULL;
+        }
+
+        if (pdf_name_eq(ctx, obj, PDF_NAME(Type1C)))
+            ext = "cff";
+        else if (pdf_name_eq(ctx, obj, PDF_NAME(CIDFontType0C)))
+            ext = "cid";
+        else if (pdf_name_eq(ctx, obj, PDF_NAME(OpenType)))
+            ext = "otf";
+        else
+            PySys_WriteStdout("warning: unhandled font type '%s'", pdf_to_name(ctx, obj));
+    }
+
+    if (!stream) {
+        PySys_WriteStdout("warning: unhandled font type");
+        return NULL;
+    }
+
+    return pdf_load_stream(ctx, stream);
+}
+
+//-----------------------------------------------------------------------------
+// Return the file extension of a font file, identified by xref
+//-----------------------------------------------------------------------------
+char *JM_get_fontextension(fz_context *ctx, pdf_document *doc, int xref)
+{
+    if (xref < 1) return "n/a";
+    pdf_obj *o, *obj = NULL, *desft;
+    o = pdf_load_object(ctx, doc, xref);
+    desft = pdf_dict_get(ctx, o, PDF_NAME(DescendantFonts));
+    if (desft) {
+        obj = pdf_resolve_indirect(ctx, pdf_array_get(ctx, desft, 0));
+        obj = pdf_dict_get(ctx, obj, PDF_NAME(FontDescriptor));
+    } else {
+        obj = pdf_dict_get(ctx, o, PDF_NAME(FontDescriptor));
+    }
+
+    pdf_drop_obj(ctx, o);
+    if (!obj) return "n/a";           // this is a base-14 font
+
+    o = obj;                           // we have the FontDescriptor
+
+    obj = pdf_dict_get(ctx, o, PDF_NAME(FontFile));
+    if (obj) return "pfa";
+
+    obj = pdf_dict_get(ctx, o, PDF_NAME(FontFile2));
+    if (obj) return "ttf";
+
+    obj = pdf_dict_get(ctx, o, PDF_NAME(FontFile3));
+    if (obj) {
+        obj = pdf_dict_get(ctx, obj, PDF_NAME(Subtype));
+        if (obj && !pdf_is_name(ctx, obj)) {
+            PySys_WriteStdout("invalid font descriptor subtype");
+            return "n/a";
+        }
+        if (pdf_name_eq(ctx, obj, PDF_NAME(Type1C)))
+            return "cff";
+        else if (pdf_name_eq(ctx, obj, PDF_NAME(CIDFontType0C)))
+            return "cid";
+        else if (pdf_name_eq(ctx, obj, PDF_NAME(OpenType)))
+            return "otf";
+        else
+            PySys_WriteStdout("unhandled font type '%s'", pdf_to_name(ctx, obj));
+    }
+
+    return "n/a";
+}
+
+
+//-----------------------------------------------------------------------------
+// create PDF object from given string (new in v1.14.0: MuPDF dropped it)
+//-----------------------------------------------------------------------------
+pdf_obj *JM_pdf_obj_from_str(fz_context *ctx, pdf_document *doc, char *src)
+{
+    pdf_obj *result = NULL;
+    pdf_lexbuf lexbuf;
+    fz_stream *stream = fz_open_memory(ctx, (unsigned char *)src, strlen(src));
+
+    pdf_lexbuf_init(ctx, &lexbuf, PDF_LEXBUF_SMALL);
+
+    fz_try(ctx) {
+        result = pdf_parse_stm_obj(ctx, doc, stream, &lexbuf);
+    }
+
+    fz_always(ctx) {
+        pdf_lexbuf_fin(ctx, &lexbuf);
+        fz_drop_stream(ctx, stream);
+    }
+
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+
+    return result;
+
+}
+
+//----------------------------------------------------------------------------
+// return normalized /Rotate value
+//----------------------------------------------------------------------------
+int JM_norm_rotation(int rotate)
+{
+    while (rotate < 0) rotate += 360;
+    while (rotate >= 360) rotate -= 360;
+    if (rotate % 90 != 0) return 0;
+    return rotate;
+}
+
+
+//----------------------------------------------------------------------------
+// return a PDF page's /Rotate value: one of (0, 90, 180, 270)
+//----------------------------------------------------------------------------
+int JM_page_rotation(fz_context *ctx, pdf_page *page)
+{
+    int rotate = 0;
+    fz_try(ctx)
+    {
+        rotate = pdf_to_int(ctx,
+                pdf_dict_get_inheritable(ctx, page->obj, PDF_NAME(Rotate)));
+        rotate = JM_norm_rotation(rotate);
+    }
+    fz_catch(ctx) return 0;
+    return rotate;
+}
+
+
+//----------------------------------------------------------------------------
+// return a PDF page's MediaBox
+//----------------------------------------------------------------------------
+fz_rect JM_mediabox(fz_context *ctx, pdf_page *page)
+{
+    fz_rect mediabox, page_mediabox;
+
+    mediabox = pdf_to_rect(ctx, pdf_dict_get_inheritable(ctx, page->obj,
+        PDF_NAME(MediaBox)));
+    if (fz_is_empty_rect(mediabox) || fz_is_infinite_rect(mediabox))
+    {
+        mediabox.x0 = 0;
+        mediabox.y0 = 0;
+        mediabox.x1 = 612;
+        mediabox.y1 = 792;
+    }
+
+    page_mediabox.x0 = fz_min(mediabox.x0, mediabox.x1);
+    page_mediabox.y0 = fz_min(mediabox.y0, mediabox.y1);
+    page_mediabox.x1 = fz_max(mediabox.x0, mediabox.x1);
+    page_mediabox.y1 = fz_max(mediabox.y0, mediabox.y1);
+
+    if (page_mediabox.x1 - page_mediabox.x0 < 1 ||
+        page_mediabox.y1 - page_mediabox.y0 < 1)
+        page_mediabox = fz_unit_rect;
+
+    return page_mediabox;
+}
+
+
+//----------------------------------------------------------------------------
+// return a PDF page's CropBox
+//----------------------------------------------------------------------------
+fz_rect JM_cropbox(fz_context *ctx, pdf_page *page)
+{
+    fz_rect mediabox = JM_mediabox(ctx, page);
+    fz_rect cropbox = pdf_to_rect(ctx,
+                pdf_dict_get_inheritable(ctx, page->obj, PDF_NAME(CropBox)));
+    if (fz_is_infinite_rect(cropbox) || fz_is_empty_rect(cropbox))
+        return mediabox;
+    float y0 = mediabox.y1 - cropbox.y1;
+    float y1 = mediabox.y1 - cropbox.y0;
+    cropbox.y0 = y0;
+    cropbox.y1 = y1;
+    return cropbox;
+}
+
+
+//----------------------------------------------------------------------------
+// calculate width and height of the UNROTATED page
+//----------------------------------------------------------------------------
+fz_point JM_cropbox_size(fz_context *ctx, pdf_page *page)
+{
+    fz_point size;
+    fz_try(ctx)
+    {
+        fz_rect rect = JM_cropbox(ctx, page);
+        float w = (rect.x0 < rect.x1 ? rect.x1 - rect.x0 : rect.x0 - rect.x1);
+        float h = (rect.y0 < rect.y1 ? rect.y1 - rect.y0 : rect.y0 - rect.y1);
+        size = fz_make_point(w, h);
+    }
+    fz_catch(ctx) fz_rethrow(ctx);
+    return size;
+}
+
+
+//----------------------------------------------------------------------------
+// calculate page rotation matrices
+//----------------------------------------------------------------------------
+fz_matrix JM_rotate_page_matrix(fz_context *ctx, pdf_page *page)
+{
+    if (!page) return fz_identity;  // no valid pdf page given
+    int rotation = JM_page_rotation(ctx, page);
+    if (rotation == 0) return fz_identity;  // no rotation
+    fz_matrix m;
+    fz_point cb_size = JM_cropbox_size(ctx, page);
+    float w = cb_size.x;
+    float h = cb_size.y;
+    if (rotation == 90)
+        m = fz_make_matrix(0, 1, -1, 0, h, 0);
+    else if (rotation == 180)
+        m = fz_make_matrix(-1, 0, 0, -1, w, h);
+    else
+        m = fz_make_matrix(0, -1, 1, 0, 0, w);
+    return m;
+}
+
+
+fz_matrix JM_derotate_page_matrix(fz_context *ctx, pdf_page *page)
+{  // just the inverse of rotation
+    return fz_invert_matrix(JM_rotate_page_matrix(ctx, page));
+}
+
+
+//-----------------------------------------------------------------------------
+// dummy structure for various tools and utilities
+//-----------------------------------------------------------------------------
+struct Tools {int index;};
+
+typedef struct fz_item fz_item;
+
+struct fz_item
+{
+       void *key;
+       fz_storable *val;
+       size_t size;
+       fz_item *next;
+       fz_item *prev;
+       fz_store *store;
+       const fz_store_type *type;
+};
+
+struct fz_store
+{
+       int refs;
+
+       /* Every item in the store is kept in a doubly linked list, ordered
+        * by usage (so LRU entries are at the end). */
+       fz_item *head;
+       fz_item *tail;
+
+       /* We have a hash table that allows to quickly find a subset of the
+        * entries (those whose keys are indirect objects). */
+       fz_hash_table *hash;
+
+       /* We keep track of the size of the store, and keep it below max. */
+       size_t max;
+       size_t size;
+
+       int defer_reap_count;
+       int needs_reaping;
+};
+
+
+%}
diff --git a/fitz/helper-pdfinfo.i b/fitz/helper-pdfinfo.i

new file mode 100644 (file)

index 0000000..97f86f2
--- /dev/null
+++ b/fitz/helper-pdfinfo.i
@@ -0,0 +1,231 @@
+%{
+
+//-----------------------------------------------------------------------------
+// Store info of a font in Python list
+//-----------------------------------------------------------------------------
+void JM_gather_fonts(fz_context *ctx, pdf_document *pdf, pdf_obj *dict,
+                    PyObject *fontlist, int stream_xref)
+{
+    int i, n;
+    n = pdf_dict_len(ctx, dict);
+    for (i = 0; i < n; i++)
+    {
+        pdf_obj *fontdict = NULL;
+        pdf_obj *subtype = NULL;
+        pdf_obj *basefont = NULL;
+        pdf_obj *name = NULL;
+        pdf_obj *refname = NULL;
+        pdf_obj *encoding = NULL;
+
+        refname = pdf_dict_get_key(ctx, dict, i);
+        fontdict = pdf_dict_get_val(ctx, dict, i);
+        if (!pdf_is_dict(ctx, fontdict)) {
+            fz_warn(ctx, "'%s' is no font dict (%d 0 R)",
+                    pdf_to_name(ctx, refname), pdf_to_num(ctx, fontdict));
+            continue;
+        }
+        subtype = pdf_dict_get(ctx, fontdict, PDF_NAME(Subtype));
+        basefont = pdf_dict_get(ctx, fontdict, PDF_NAME(BaseFont));
+        if (!basefont || pdf_is_null(ctx, basefont))
+            name = pdf_dict_get(ctx, fontdict, PDF_NAME(Name));
+        else
+            name = basefont;
+        encoding = pdf_dict_get(ctx, fontdict, PDF_NAME(Encoding));
+        if (pdf_is_dict(ctx, encoding))
+            encoding = pdf_dict_get(ctx, encoding, PDF_NAME(BaseEncoding));
+        int xref = pdf_to_num(ctx, fontdict);
+        char *ext = "n/a";
+        if (xref) ext = JM_get_fontextension(ctx, pdf, xref);
+        PyObject *entry = PyTuple_New(7);
+        PyTuple_SET_ITEM(entry, 0, Py_BuildValue("i", xref));
+        PyTuple_SET_ITEM(entry, 1, Py_BuildValue("s", ext));
+        PyTuple_SET_ITEM(entry, 2, Py_BuildValue("s", pdf_to_name(ctx, subtype)));
+        PyTuple_SET_ITEM(entry, 3, JM_EscapeStrFromStr(pdf_to_name(ctx, name)));
+        PyTuple_SET_ITEM(entry, 4, Py_BuildValue("s", pdf_to_name(ctx, refname)));
+        PyTuple_SET_ITEM(entry, 5, Py_BuildValue("s", pdf_to_name(ctx, encoding)));
+        PyTuple_SET_ITEM(entry, 6, Py_BuildValue("i", stream_xref));
+        LIST_APPEND_DROP(fontlist, entry);
+    }
+}
+
+//-----------------------------------------------------------------------------
+// Store info of an image in Python list
+//-----------------------------------------------------------------------------
+void JM_gather_images(fz_context *ctx, pdf_document *doc, pdf_obj *dict,
+                     PyObject *imagelist, int stream_xref)
+{
+    int i, n;
+    n = pdf_dict_len(ctx, dict);
+    for (i = 0; i < n; i++) {
+        pdf_obj *imagedict, *smask;
+        pdf_obj *refname = NULL;
+        pdf_obj *type;
+        pdf_obj *width;
+        pdf_obj *height;
+        pdf_obj *bpc = NULL;
+        pdf_obj *filter = NULL;
+        pdf_obj *cs = NULL;
+        pdf_obj *altcs;
+
+        refname = pdf_dict_get_key(ctx, dict, i);
+        imagedict = pdf_dict_get_val(ctx, dict, i);
+        if (!pdf_is_dict(ctx, imagedict)) {
+            fz_warn(ctx, "'%s' is no image dict (%d 0 R)",
+                    pdf_to_name(ctx, refname), pdf_to_num(ctx, imagedict));
+            continue;
+        }
+
+        type = pdf_dict_get(ctx, imagedict, PDF_NAME(Subtype));
+        if (!pdf_name_eq(ctx, type, PDF_NAME(Image)))
+            continue;
+
+        int xref = pdf_to_num(ctx, imagedict);
+        int gen = 0;
+        smask = pdf_dict_get(ctx, imagedict, PDF_NAME(SMask));
+        if (smask)
+            gen = pdf_to_num(ctx, smask);
+        filter = pdf_dict_get(ctx, imagedict, PDF_NAME(Filter));
+        if (pdf_is_array(ctx, filter)) {
+            filter = pdf_array_get(ctx, filter, 0);
+        }
+
+        altcs = NULL;
+        cs = pdf_dict_get(ctx, imagedict, PDF_NAME(ColorSpace));
+        if (pdf_is_array(ctx, cs)) {
+            pdf_obj *cses = cs;
+            cs = pdf_array_get(ctx, cses, 0);
+            if (pdf_name_eq(ctx, cs, PDF_NAME(DeviceN)) ||
+                pdf_name_eq(ctx, cs, PDF_NAME(Separation))) {
+                altcs = pdf_array_get(ctx, cses, 2);
+                if (pdf_is_array(ctx, altcs))
+                    altcs = pdf_array_get(ctx, altcs, 0);
+            }
+        }
+
+        width = pdf_dict_get(ctx, imagedict, PDF_NAME(Width));
+        height = pdf_dict_get(ctx, imagedict, PDF_NAME(Height));
+        bpc = pdf_dict_get(ctx, imagedict, PDF_NAME(BitsPerComponent));
+
+        PyObject *entry = PyTuple_New(10);
+        PyTuple_SET_ITEM(entry, 0, Py_BuildValue("i", xref));
+        PyTuple_SET_ITEM(entry, 1, Py_BuildValue("i", gen));
+        PyTuple_SET_ITEM(entry, 2, Py_BuildValue("i", pdf_to_int(ctx, width)));
+        PyTuple_SET_ITEM(entry, 3, Py_BuildValue("i", pdf_to_int(ctx, height)));
+        PyTuple_SET_ITEM(entry, 4, Py_BuildValue("i", pdf_to_int(ctx, bpc)));
+        PyTuple_SET_ITEM(entry, 5, JM_EscapeStrFromStr(pdf_to_name(ctx, cs)));
+        PyTuple_SET_ITEM(entry, 6, JM_EscapeStrFromStr(pdf_to_name(ctx, altcs)));
+        PyTuple_SET_ITEM(entry, 7, JM_EscapeStrFromStr(pdf_to_name(ctx, refname)));
+        PyTuple_SET_ITEM(entry, 8, JM_EscapeStrFromStr(pdf_to_name(ctx, filter)));
+        PyTuple_SET_ITEM(entry, 9, Py_BuildValue("i", stream_xref));
+        LIST_APPEND_DROP(imagelist, entry);
+    }
+}
+
+//-----------------------------------------------------------------------------
+// Store info of a /Form xobject in Python list
+//-----------------------------------------------------------------------------
+void JM_gather_forms(fz_context *ctx, pdf_document *doc, pdf_obj *dict,
+                     PyObject *imagelist, int stream_xref)
+{
+    int i, n = pdf_dict_len(ctx, dict);
+    fz_rect bbox;
+    pdf_obj *o = NULL, *m = NULL;
+    for (i = 0; i < n; i++) {
+        pdf_obj *imagedict;
+        pdf_obj *refname = NULL;
+        pdf_obj *type;
+
+        refname = pdf_dict_get_key(ctx, dict, i);
+        imagedict = pdf_dict_get_val(ctx, dict, i);
+        if (!pdf_is_dict(ctx, imagedict)) {
+            fz_warn(ctx, "'%s' is no form dict (%d 0 R)",
+                    pdf_to_name(ctx, refname), pdf_to_num(ctx, imagedict));
+            continue;
+        }
+
+        type = pdf_dict_get(ctx, imagedict, PDF_NAME(Subtype));
+        if (!pdf_name_eq(ctx, type, PDF_NAME(Form)))
+            continue;
+
+        o = pdf_dict_get(ctx, imagedict, PDF_NAME(BBox));
+        m = pdf_dict_get(ctx, imagedict, PDF_NAME(Matrix));
+        if (o) {
+            if (m) {
+                bbox = fz_transform_rect(pdf_to_rect(ctx, o), pdf_to_matrix(ctx, m));
+            }
+            else {
+                bbox = pdf_to_rect(ctx, o);
+            }
+        }
+        else {
+            bbox = fz_infinite_rect;
+        }
+        int xref = pdf_to_num(ctx, imagedict);
+
+        PyObject *entry = PyTuple_New(4);
+        PyTuple_SET_ITEM(entry, 0, Py_BuildValue("i", xref));
+        PyTuple_SET_ITEM(entry, 1, Py_BuildValue("s", pdf_to_name(ctx, refname)));
+        PyTuple_SET_ITEM(entry, 2, Py_BuildValue("i", stream_xref));
+        PyTuple_SET_ITEM(entry, 3, Py_BuildValue("ffff",
+                                   bbox.x0, bbox.y0, bbox.x1, bbox.y1));
+        LIST_APPEND_DROP(imagelist, entry);
+    }
+}
+
+//-----------------------------------------------------------------------------
+// Step through /Resources, looking up image, xobject or font information
+//-----------------------------------------------------------------------------
+void JM_scan_resources(fz_context *ctx, pdf_document *pdf, pdf_obj *rsrc,
+                 PyObject *liste, int what, int stream_xref)
+{
+    pdf_obj *font, *xobj, *subrsrc;
+    int i, n, sxref;
+    if (pdf_mark_obj(ctx, rsrc)) return;    // stop on cylic dependencies
+    fz_try(ctx) {
+        if (what == 1) {
+            font = pdf_dict_get(ctx, rsrc, PDF_NAME(Font));
+            JM_gather_fonts(ctx, pdf, font, liste, stream_xref);
+            n = pdf_dict_len(ctx, font);
+            for (i = 0; i < n; i++) {
+                pdf_obj *obj = pdf_dict_get_val(ctx, font, i);
+                if (pdf_is_stream(ctx, obj)) {
+                    sxref = pdf_to_num(ctx, obj);
+                }
+                else {
+                    sxref = 0;
+                }
+                subrsrc = pdf_dict_get(ctx, obj, PDF_NAME(Resources));
+                if (subrsrc)
+                    JM_scan_resources(ctx, pdf, subrsrc, liste, what, sxref);
+            }
+        }
+
+        xobj = pdf_dict_get(ctx, rsrc, PDF_NAME(XObject));
+
+        if (what == 2) {  // look up images
+            JM_gather_images(ctx, pdf, xobj, liste, stream_xref);
+        }
+
+        if (what == 3) {  // look up form xobjects
+            JM_gather_forms(ctx, pdf, xobj, liste, stream_xref);
+        }
+
+        n = pdf_dict_len(ctx, xobj);
+        for (i = 0; i < n; i++) {
+            pdf_obj *obj = pdf_dict_get_val(ctx, xobj, i);
+            if (pdf_is_stream(ctx, obj)) {
+                sxref = pdf_to_num(ctx, obj);
+            }
+            else {
+                sxref = 0;
+            }
+            subrsrc = pdf_dict_get(ctx, obj, PDF_NAME(Resources));
+            if (subrsrc)
+                JM_scan_resources(ctx, pdf, subrsrc, liste, what, sxref);
+        }
+    }
+    fz_always(ctx) pdf_unmark_obj(ctx, rsrc);
+    fz_catch(ctx)  fz_rethrow(ctx);
+}
+
+%}
diff --git a/fitz/helper-pixmap.i b/fitz/helper-pixmap.i

new file mode 100644 (file)

index 0000000..815d40d
--- /dev/null
+++ b/fitz/helper-pixmap.i
@@ -0,0 +1,352 @@
+%{
+//-----------------------------------------------------------------------------
+// pixmap helper functions
+//-----------------------------------------------------------------------------
+
+//-----------------------------------------------------------------------------
+// Clear a pixmap rectangle - my version also supports non-alpha pixmaps
+//-----------------------------------------------------------------------------
+int
+JM_clear_pixmap_rect_with_value(fz_context *ctx, fz_pixmap *dest, int value, fz_irect b)
+{
+    unsigned char *destp;
+    int x, y, w, k, destspan;
+
+    b = fz_intersect_irect(b, fz_pixmap_bbox(ctx, dest));
+    w = b.x1 - b.x0;
+    y = b.y1 - b.y0;
+    if (w <= 0 || y <= 0)
+        return 0;
+
+    destspan = dest->stride;
+    destp = dest->samples + (unsigned int)(destspan * (b.y0 - dest->y) + dest->n * (b.x0 - dest->x));
+
+    /* CMYK needs special handling (and potentially any other subtractive colorspaces) */
+    if (fz_colorspace_n(ctx, dest->colorspace) == 4) {
+        value = 255 - value;
+        do {
+            unsigned char *s = destp;
+            for (x = 0; x < w; x++) {
+                *s++ = 0;
+                *s++ = 0;
+                *s++ = 0;
+                *s++ = value;
+                if (dest->alpha) *s++ = 255;
+            }
+            destp += destspan;
+        } while (--y);
+        return 1;
+    }
+
+    do {
+        unsigned char *s = destp;
+        for (x = 0; x < w; x++) {
+            for (k = 0; k < dest->n - 1; k++)
+                *s++ = value;
+            if (dest->alpha) *s++ = 255;
+            else *s++ = value;
+        }
+        destp += destspan;
+    } while (--y);
+    return 1;
+}
+
+//-----------------------------------------------------------------------------
+// fill a rect with a color tuple
+//-----------------------------------------------------------------------------
+int
+JM_fill_pixmap_rect_with_color(fz_context *ctx, fz_pixmap *dest, unsigned char col[5], fz_irect b)
+{
+    unsigned char *destp;
+    int x, y, w, i, destspan;
+
+    b = fz_intersect_irect(b, fz_pixmap_bbox(ctx, dest));
+    w = b.x1 - b.x0;
+    y = b.y1 - b.y0;
+    if (w <= 0 || y <= 0)
+        return 0;
+
+    destspan = dest->stride;
+    destp = dest->samples + (unsigned int)(destspan * (b.y0 - dest->y) + dest->n * (b.x0 - dest->x));
+
+    do {
+        unsigned char *s = destp;
+        for (x = 0; x < w; x++) {
+            for (i = 0; i < dest->n; i++)
+                *s++ = col[i];
+        }
+        destp += destspan;
+    } while (--y);
+    return 1;
+}
+
+//-----------------------------------------------------------------------------
+// invert a rectangle - also supports non-alpha pixmaps
+//-----------------------------------------------------------------------------
+int
+JM_invert_pixmap_rect(fz_context *ctx, fz_pixmap *dest, fz_irect b)
+{
+    unsigned char *destp;
+    int x, y, w, i, destspan;
+
+    b = fz_intersect_irect(b, fz_pixmap_bbox(ctx, dest));
+    w = b.x1 - b.x0;
+    y = b.y1 - b.y0;
+    if (w <= 0 || y <= 0)
+        return 0;
+
+    destspan = dest->stride;
+    destp = dest->samples + (unsigned int)(destspan * (b.y0 - dest->y) + dest->n * (b.x0 - dest->x));
+    int n0 = dest->n - dest->alpha;
+    do {
+        unsigned char *s = destp;
+        for (x = 0; x < w; x++) {
+            for (i = 0; i < n0; i++)
+                *s++ = 255 - *s;
+            if (dest->alpha) *s++;
+        }
+        destp += destspan;
+    } while (--y);
+    return 1;
+}
+
+//-----------------------------------------------------------------------------
+// Return basic properties of an image provided as bytes or bytearray
+// The function creates an fz_image and optionally returns it.
+//-----------------------------------------------------------------------------
+PyObject *JM_image_profile(fz_context *ctx, PyObject *imagedata, int keep_image)
+{
+    if (!EXISTS(imagedata)) {
+        Py_RETURN_NONE;  // nothing given
+    }
+    fz_image *image = NULL;
+    fz_buffer *res = NULL;
+    PyObject *result = NULL;
+    unsigned char *c = NULL;
+    Py_ssize_t len = 0;
+    if (PyBytes_Check(imagedata)) {
+        c = PyBytes_AS_STRING(imagedata);
+        len = PyBytes_GET_SIZE(imagedata);
+    }
+    else if (PyByteArray_Check(imagedata)) {
+        c = PyByteArray_AS_STRING(imagedata);
+        len = PyByteArray_GET_SIZE(imagedata);
+    }
+    else {
+        PySys_WriteStderr("bad image data\n");
+        Py_RETURN_NONE;
+    }
+
+    if (len < 8) {
+        PySys_WriteStderr("bad image data\n");
+        Py_RETURN_NONE;
+    }
+    int type = fz_recognize_image_format(ctx, c);
+    if (type == FZ_IMAGE_UNKNOWN) {
+        Py_RETURN_NONE;
+    }
+
+    fz_try(ctx) {
+        if (keep_image) {
+            res = fz_new_buffer_from_copied_data(ctx, c, (size_t) len);
+        }
+        else {
+            res = fz_new_buffer_from_shared_data(ctx, c, (size_t) len);
+        }
+        image = fz_new_image_from_buffer(ctx, res);
+        int xres, yres;
+        fz_image_resolution(image, &xres, &yres);
+        const char *cs_name = fz_colorspace_name(gctx, image->colorspace);
+        result = PyDict_New();
+        DICT_SETITEM_DROP(result, dictkey_width,
+                Py_BuildValue("i", image->w));
+        DICT_SETITEM_DROP(result, dictkey_height,
+                Py_BuildValue("i", image->h));
+        DICT_SETITEM_DROP(result, dictkey_xres,
+                Py_BuildValue("i", xres));
+        DICT_SETITEM_DROP(result, dictkey_yres,
+                Py_BuildValue("i", yres));
+        DICT_SETITEM_DROP(result, dictkey_colorspace,
+                Py_BuildValue("i", image->n));
+        DICT_SETITEM_DROP(result, dictkey_bpc,
+                Py_BuildValue("i", image->bpc));
+        DICT_SETITEM_DROP(result, dictkey_ext,
+                Py_BuildValue("s", JM_image_extension(type)));
+        DICT_SETITEM_DROP(result, dictkey_cs_name,
+                Py_BuildValue("s", cs_name));
+
+        if (keep_image) {
+            DICT_SETITEM_DROP(result, dictkey_image,
+                    PyLong_FromVoidPtr((void *) fz_keep_image(ctx, image)));
+        }
+    }
+    fz_always(ctx) {
+        if (!keep_image) {
+            fz_drop_image(ctx, image);
+        }
+        else {
+            fz_drop_buffer(ctx, res);  // drop the buffer copy
+        }
+    }
+    fz_catch(ctx) {
+        Py_CLEAR(result);
+        Py_RETURN_NONE;
+    }
+    PyErr_Clear();
+    return result;
+}
+
+//----------------------------------------------------------------------------
+// Version of fz_new_pixmap_from_display_list (util.c) to also support
+// rendering of only the 'clip' part of the displaylist rectangle
+//----------------------------------------------------------------------------
+fz_pixmap *
+JM_pixmap_from_display_list(fz_context *ctx,
+                            fz_display_list *list,
+                            PyObject *ctm,
+                            fz_colorspace *cs,
+                            int alpha,
+                            PyObject *clip,
+                            fz_separations *seps
+                           )
+{
+    fz_rect rect = fz_bound_display_list(ctx, list);
+    fz_matrix matrix = JM_matrix_from_py(ctm);
+    fz_pixmap *pix = NULL;
+    fz_var(pix);
+    fz_device *dev = NULL;
+    fz_var(dev);
+    fz_rect rclip = JM_rect_from_py(clip);
+    rect = fz_intersect_rect(rect, rclip);  // no-op if clip is not given
+
+    rect = fz_transform_rect(rect, matrix);
+    fz_irect irect = fz_round_rect(rect);
+
+    pix = fz_new_pixmap_with_bbox(ctx, cs, irect, seps, alpha);
+    if (alpha)
+        fz_clear_pixmap(ctx, pix);
+    else
+        fz_clear_pixmap_with_value(ctx, pix, 0xFF);
+
+    fz_try(ctx) {
+        if (!fz_is_infinite_rect(rclip)) {
+            dev = fz_new_draw_device_with_bbox(ctx, matrix, pix, &irect);
+            fz_run_display_list(ctx, list, dev, fz_identity, rclip, NULL);
+        }
+        else {
+            dev = fz_new_draw_device(ctx, matrix, pix);
+            fz_run_display_list(ctx, list, dev, fz_identity, fz_infinite_rect, NULL);
+        }
+
+        fz_close_device(ctx, dev);
+    }
+    fz_always(ctx) {
+        fz_drop_device(ctx, dev);
+    }
+    fz_catch(ctx) {
+        fz_drop_pixmap(ctx, pix);
+        fz_rethrow(ctx);
+    }
+    return pix;
+}
+
+//----------------------------------------------------------------------------
+// Pixmap creation directly using a short-lived displaylist, so we can support
+// separations.
+//----------------------------------------------------------------------------
+fz_pixmap *
+JM_pixmap_from_page(fz_context *ctx,
+                    fz_document *doc,
+                    fz_page *page,
+                    PyObject *ctm,
+                    fz_colorspace *cs,
+                    int alpha,
+                    int annots,
+                    PyObject *clip
+                   )
+{
+    enum { SPOTS_NONE, SPOTS_OVERPRINT_SIM, SPOTS_FULL };
+    int spots;
+    if (FZ_ENABLE_SPOT_RENDERING)
+        spots = SPOTS_OVERPRINT_SIM;
+    else
+        spots = SPOTS_NONE;
+
+    fz_separations *seps = NULL;
+    fz_pixmap *pix = NULL;
+    fz_colorspace *oi = NULL;
+    fz_var(oi);
+    fz_colorspace *colorspace = cs;
+    fz_rect rect;
+    fz_irect bbox;
+    fz_device *dev = NULL;
+    fz_var(dev);
+    fz_matrix matrix = JM_matrix_from_py(ctm);
+    rect = fz_bound_page(ctx, page);
+    fz_rect rclip = JM_rect_from_py(clip);
+    rect = fz_intersect_rect(rect, rclip);  // no-op if clip is not given
+    rect = fz_transform_rect(rect, matrix);
+    bbox = fz_round_rect(rect);
+
+    fz_try(ctx) {
+        // Pixmap of the document's /OutputIntents ("output intents")
+        oi = fz_document_output_intent(ctx, doc);
+        // if present and compatible, use it instead of the parameter
+        if (oi) {
+            if (fz_colorspace_n(ctx, oi) == fz_colorspace_n(ctx, cs)) {
+                colorspace = fz_keep_colorspace(ctx, oi);
+            }
+        }
+
+        // check if spots rendering is available and if so use separations
+        if (spots != SPOTS_NONE) {
+            seps = fz_page_separations(ctx, page);
+            if (seps) {
+                int i, n = fz_count_separations(ctx, seps);
+                if (spots == SPOTS_FULL)
+                    for (i = 0; i < n; i++)
+                        fz_set_separation_behavior(ctx, seps, i, FZ_SEPARATION_SPOT);
+                else
+                    for (i = 0; i < n; i++)
+                        fz_set_separation_behavior(ctx, seps, i, FZ_SEPARATION_COMPOSITE);
+            } else if (fz_page_uses_overprint(ctx, page)) {
+                /* This page uses overprint, so we need an empty
+                 * sep object to force the overprint simulation on. */
+                seps = fz_new_separations(ctx, 0);
+            } else if (oi && fz_colorspace_n(ctx, oi) != fz_colorspace_n(ctx, colorspace)) {
+                /* We have an output intent, and it's incompatible
+                 * with the colorspace our device needs. Force the
+                 * overprint simulation on, because this ensures that
+                 * we 'simulate' the output intent too. */
+                seps = fz_new_separations(ctx, 0);
+            }
+        }
+
+        pix = fz_new_pixmap_with_bbox(ctx, colorspace, bbox, seps, alpha);
+
+        if (alpha) {
+            fz_clear_pixmap(ctx, pix);
+        } else {
+            fz_clear_pixmap_with_value(ctx, pix, 0xFF);
+        }
+
+        dev = fz_new_draw_device(ctx, matrix, pix);
+        if (annots) {
+            fz_run_page(ctx, page, dev, fz_identity, NULL);
+        } else {
+            fz_run_page_contents(ctx, page, dev, fz_identity, NULL);
+        }
+        fz_close_device(ctx, dev);
+    }
+    fz_always(ctx) {
+        fz_drop_device(ctx, dev);
+        fz_drop_separations(ctx, seps);
+        fz_drop_colorspace(ctx, oi);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return pix;
+}
+
+%}
diff --git a/fitz/helper-portfolio.i b/fitz/helper-portfolio.i

new file mode 100644 (file)

index 0000000..89c4255
--- /dev/null
+++ b/fitz/helper-portfolio.i
@@ -0,0 +1,66 @@
+%{
+//-----------------------------------------------------------------------------
+// perform some cleaning if we have /EmbeddedFiles:
+// (1) remove any /Limits if /Names exists
+// (2) remove any empty /Collection
+// (3) set /PageMode/UseAttachments
+//-----------------------------------------------------------------------------
+void JM_embedded_clean(fz_context *ctx, pdf_document *pdf)
+{
+    pdf_obj *root = pdf_dict_get(ctx, pdf_trailer(ctx, pdf), PDF_NAME(Root));
+
+    // remove any empty /Collection entry
+    pdf_obj *coll = pdf_dict_get(ctx, root, PDF_NAME(Collection));
+    if (coll && pdf_dict_len(ctx, coll) == 0)
+        pdf_dict_del(ctx, root, PDF_NAME(Collection));
+
+    pdf_obj *efiles = pdf_dict_getl(ctx, root,
+                                    PDF_NAME(Names),
+                                    PDF_NAME(EmbeddedFiles),
+                                    PDF_NAME(Names),
+                                    NULL);
+    if (efiles) {
+        pdf_dict_put_name(ctx, root, PDF_NAME(PageMode), "UseAttachments");
+    }
+    return;
+}
+
+//-----------------------------------------------------------------------------
+// embed a new file in a PDF (not only /EmbeddedFiles entries)
+//-----------------------------------------------------------------------------
+pdf_obj *JM_embed_file(fz_context *ctx,
+                       pdf_document *pdf,
+                       fz_buffer *buf,
+                       char *filename,
+                       char *ufilename,
+                       char *desc,
+                       int compress)
+{
+    size_t len = 0;
+    pdf_obj *ef, *f, *params, *val = NULL;
+    fz_var(val);
+    fz_try(ctx) {
+        val = pdf_new_dict(ctx, pdf, 6);
+        pdf_dict_put_dict(ctx, val, PDF_NAME(CI), 4);
+        ef = pdf_dict_put_dict(ctx, val, PDF_NAME(EF), 4);
+        pdf_dict_put_text_string(ctx, val, PDF_NAME(F), filename);
+        pdf_dict_put_text_string(ctx, val, PDF_NAME(UF), ufilename);
+        pdf_dict_put_text_string(ctx, val, PDF_NAME(Desc), desc);
+        pdf_dict_put(ctx, val, PDF_NAME(Type), PDF_NAME(Filespec));
+        f = pdf_add_stream(ctx, pdf,
+                           fz_new_buffer_from_copied_data(ctx, "  ", 1),
+                           NULL, 0);
+        pdf_dict_put_drop(ctx, ef, PDF_NAME(F), f);
+        JM_update_stream(ctx, pdf, f, buf, compress);
+        len = fz_buffer_storage(ctx, buf, NULL);
+        pdf_dict_put_int(ctx, f, PDF_NAME(DL), len);
+        pdf_dict_put_int(ctx, f, PDF_NAME(Length), len);
+        params = pdf_dict_put_dict(ctx, f, PDF_NAME(Params), 4);
+        pdf_dict_put_int(ctx, params, PDF_NAME(Size), len);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return val;
+}
+%}
diff --git a/fitz/helper-python.i b/fitz/helper-python.i

new file mode 100644 (file)

index 0000000..810ec97
--- /dev/null
+++ b/fitz/helper-python.i
@@ -0,0 +1,1439 @@
+%pythoncode %{
+# ------------------------------------------------------------------------------
+# link kinds and link flags
+# ------------------------------------------------------------------------------
+LINK_NONE = 0
+LINK_GOTO = 1
+LINK_URI = 2
+LINK_LAUNCH = 3
+LINK_NAMED = 4
+LINK_GOTOR = 5
+LINK_FLAG_L_VALID = 1
+LINK_FLAG_T_VALID = 2
+LINK_FLAG_R_VALID = 4
+LINK_FLAG_B_VALID = 8
+LINK_FLAG_FIT_H = 16
+LINK_FLAG_FIT_V = 32
+LINK_FLAG_R_IS_ZOOM = 64
+
+# ------------------------------------------------------------------------------
+# Text handling flags
+# ------------------------------------------------------------------------------
+TEXT_ALIGN_LEFT = 0
+TEXT_ALIGN_CENTER = 1
+TEXT_ALIGN_RIGHT = 2
+TEXT_ALIGN_JUSTIFY = 3
+
+TEXT_OUTPUT_TEXT = 0
+TEXT_OUTPUT_HTML = 1
+TEXT_OUTPUT_JSON = 2
+TEXT_OUTPUT_XML = 3
+TEXT_OUTPUT_XHTML = 4
+
+TEXT_PRESERVE_LIGATURES = 1
+TEXT_PRESERVE_WHITESPACE = 2
+TEXT_PRESERVE_IMAGES = 4
+TEXT_INHIBIT_SPACES = 8
+
+# ------------------------------------------------------------------------------
+# Simple text encoding options
+# ------------------------------------------------------------------------------
+TEXT_ENCODING_LATIN = 0
+TEXT_ENCODING_GREEK = 1
+TEXT_ENCODING_CYRILLIC = 2
+# ------------------------------------------------------------------------------
+# Stamp annotation icon numbers
+# ------------------------------------------------------------------------------
+STAMP_Approved = 0
+STAMP_AsIs = 1
+STAMP_Confidential = 2
+STAMP_Departmental = 3
+STAMP_Experimental = 4
+STAMP_Expired = 5
+STAMP_Final = 6
+STAMP_ForComment = 7
+STAMP_ForPublicRelease = 8
+STAMP_NotApproved = 9
+STAMP_NotForPublicRelease = 10
+STAMP_Sold = 11
+STAMP_TopSecret = 12
+STAMP_Draft = 13
+
+# ------------------------------------------------------------------------------
+# Base 14 font names and dictionary
+# ------------------------------------------------------------------------------
+Base14_fontnames = (
+    "Courier",
+    "Courier-Oblique",
+    "Courier-Bold",
+    "Courier-BoldOblique",
+    "Helvetica",
+    "Helvetica-Oblique",
+    "Helvetica-Bold",
+    "Helvetica-BoldOblique",
+    "Times-Roman",
+    "Times-Italic",
+    "Times-Bold",
+    "Times-BoldItalic",
+    "Symbol",
+    "ZapfDingbats",
+)
+
+Base14_fontdict = {}
+for f in Base14_fontnames:
+    Base14_fontdict[f.lower()] = f
+Base14_fontdict["helv"] = "Helvetica"
+Base14_fontdict["heit"] = "Helvetica-Oblique"
+Base14_fontdict["hebo"] = "Helvetica-Bold"
+Base14_fontdict["hebi"] = "Helvetica-BoldOblique"
+Base14_fontdict["cour"] = "Courier"
+Base14_fontdict["coit"] = "Courier-Oblique"
+Base14_fontdict["cobo"] = "Courier-Bold"
+Base14_fontdict["cobi"] = "Courier-BoldOblique"
+Base14_fontdict["tiro"] = "Times-Roman"
+Base14_fontdict["tibo"] = "Times-Bold"
+Base14_fontdict["tiit"] = "Times-Italic"
+Base14_fontdict["tibi"] = "Times-BoldItalic"
+Base14_fontdict["symb"] = "Symbol"
+Base14_fontdict["zadb"] = "ZapfDingbats"
+
+annot_skel = {
+    "goto1": "<</A<</S/GoTo/D[%i 0 R/XYZ %g %g 0]>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+    "goto2": "<</A<</S/GoTo/D%s>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+    "gotor1": "<</A<</S/GoToR/D[%i /XYZ %g %g 0]/F<</F(%s)/UF(%s)/Type/Filespec>>>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+    "gotor2": "<</A<</S/GoToR/D%s/F(%s)>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+    "launch": "<</A<</S/Launch/F<</F(%s)/UF(%s)/Type/Filespec>>>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+    "uri": "<</A<</S/URI/URI(%s)>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+    "named": "<</A<</S/Named/N/%s/Type/Action>>/Rect[%s]/BS<</W 0>>/Subtype/Link>>",
+}
+
+
+def _toc_remove_page(toc, first, last):
+    """ Remove all ToC entries pointing to certain pages.
+
+    Args:
+        toc: old table of contents generated with getToC(False).
+        first: (int) number of first page to remove.
+        last: (int) number of last page to remove.
+    Returns:
+        Modified table of contents, which should be used by PDF
+        document method setToC.
+    """
+    toc2 = []  # intermediate new toc
+    count = last - first + 1  # number of pages to remove
+    # step 1: remove numbers from toc
+    for t in toc:
+        if first <= t[2] <= last:  # skip entries between first and last
+            continue
+        if t[2] < first:  # keep smaller page numbers
+            toc2.append(t)
+            continue
+        # larger page numbers
+        t[2] -= count  # decrease page number
+        d = t[3]
+        if d["kind"] == LINK_GOTO:
+            d["page"] -= count
+            t[3] = d
+        toc2.append(t)
+
+    toc3 = []  # final new toc
+    old_lvl = 0
+
+    # step 2: deal with hierarchy lvl gaps > 1
+    for t in toc2:
+        while t[0] - old_lvl > 1:  # lvl gap too large
+            old_lvl += 1  # increase previous lvl
+            toc3.append([old_lvl] + t[1:])  # insert a filler item
+        old_lvl = t[0]
+        toc3.append(t)
+
+    return toc3
+
+
+def getTextlength(text, fontname="helv", fontsize=11, encoding=0):
+    """Calculate length of a string for a given built-in font.
+
+    Args:
+        fontname: name of the font.
+        fontsize: size of font in points.
+        encoding: encoding to use (0=Latin, 1=Greek, 2=Cyrillic).
+    Returns:
+        (float) length of text.
+    """
+    fontname = fontname.lower()
+    basename = Base14_fontdict.get(fontname, None)
+
+    glyphs = None
+    if basename == "Symbol":
+        glyphs = symbol_glyphs
+    if basename == "ZapfDingbats":
+        glyphs = zapf_glyphs
+    if glyphs is not None:
+        w = sum([glyphs[ord(c)][1] if ord(c) < 256 else glyphs[183][1] for c in text])
+        return w * fontsize
+
+    if fontname in Base14_fontdict.keys():
+        return TOOLS._measure_string(
+            text, Base14_fontdict[fontname], fontsize, encoding
+        )
+
+    if fontname in (
+        "china-t",
+        "china-s",
+        "china-ts",
+        "china-ss",
+        "japan",
+        "japan-s",
+        "korea",
+        "korea-s",
+    ):
+        return len(text) * fontsize
+
+    raise ValueError("Font '%s' is unsupported" % fontname)
+
+
+# ------------------------------------------------------------------------------
+# Glyph list for the built-in font 'ZapfDingbats'
+# ------------------------------------------------------------------------------
+zapf_glyphs = (
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (32, 0.278),
+    (33, 0.974),
+    (34, 0.961),
+    (35, 0.974),
+    (36, 0.98),
+    (37, 0.719),
+    (38, 0.789),
+    (39, 0.79),
+    (40, 0.791),
+    (41, 0.69),
+    (42, 0.96),
+    (43, 0.939),
+    (44, 0.549),
+    (45, 0.855),
+    (46, 0.911),
+    (47, 0.933),
+    (48, 0.911),
+    (49, 0.945),
+    (50, 0.974),
+    (51, 0.755),
+    (52, 0.846),
+    (53, 0.762),
+    (54, 0.761),
+    (55, 0.571),
+    (56, 0.677),
+    (57, 0.763),
+    (58, 0.76),
+    (59, 0.759),
+    (60, 0.754),
+    (61, 0.494),
+    (62, 0.552),
+    (63, 0.537),
+    (64, 0.577),
+    (65, 0.692),
+    (66, 0.786),
+    (67, 0.788),
+    (68, 0.788),
+    (69, 0.79),
+    (70, 0.793),
+    (71, 0.794),
+    (72, 0.816),
+    (73, 0.823),
+    (74, 0.789),
+    (75, 0.841),
+    (76, 0.823),
+    (77, 0.833),
+    (78, 0.816),
+    (79, 0.831),
+    (80, 0.923),
+    (81, 0.744),
+    (82, 0.723),
+    (83, 0.749),
+    (84, 0.79),
+    (85, 0.792),
+    (86, 0.695),
+    (87, 0.776),
+    (88, 0.768),
+    (89, 0.792),
+    (90, 0.759),
+    (91, 0.707),
+    (92, 0.708),
+    (93, 0.682),
+    (94, 0.701),
+    (95, 0.826),
+    (96, 0.815),
+    (97, 0.789),
+    (98, 0.789),
+    (99, 0.707),
+    (100, 0.687),
+    (101, 0.696),
+    (102, 0.689),
+    (103, 0.786),
+    (104, 0.787),
+    (105, 0.713),
+    (106, 0.791),
+    (107, 0.785),
+    (108, 0.791),
+    (109, 0.873),
+    (110, 0.761),
+    (111, 0.762),
+    (112, 0.762),
+    (113, 0.759),
+    (114, 0.759),
+    (115, 0.892),
+    (116, 0.892),
+    (117, 0.788),
+    (118, 0.784),
+    (119, 0.438),
+    (120, 0.138),
+    (121, 0.277),
+    (122, 0.415),
+    (123, 0.392),
+    (124, 0.392),
+    (125, 0.668),
+    (126, 0.668),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (183, 0.788),
+    (161, 0.732),
+    (162, 0.544),
+    (163, 0.544),
+    (164, 0.91),
+    (165, 0.667),
+    (166, 0.76),
+    (167, 0.76),
+    (168, 0.776),
+    (169, 0.595),
+    (170, 0.694),
+    (171, 0.626),
+    (172, 0.788),
+    (173, 0.788),
+    (174, 0.788),
+    (175, 0.788),
+    (176, 0.788),
+    (177, 0.788),
+    (178, 0.788),
+    (179, 0.788),
+    (180, 0.788),
+    (181, 0.788),
+    (182, 0.788),
+    (183, 0.788),
+    (184, 0.788),
+    (185, 0.788),
+    (186, 0.788),
+    (187, 0.788),
+    (188, 0.788),
+    (189, 0.788),
+    (190, 0.788),
+    (191, 0.788),
+    (192, 0.788),
+    (193, 0.788),
+    (194, 0.788),
+    (195, 0.788),
+    (196, 0.788),
+    (197, 0.788),
+    (198, 0.788),
+    (199, 0.788),
+    (200, 0.788),
+    (201, 0.788),
+    (202, 0.788),
+    (203, 0.788),
+    (204, 0.788),
+    (205, 0.788),
+    (206, 0.788),
+    (207, 0.788),
+    (208, 0.788),
+    (209, 0.788),
+    (210, 0.788),
+    (211, 0.788),
+    (212, 0.894),
+    (213, 0.838),
+    (214, 1.016),
+    (215, 0.458),
+    (216, 0.748),
+    (217, 0.924),
+    (218, 0.748),
+    (219, 0.918),
+    (220, 0.927),
+    (221, 0.928),
+    (222, 0.928),
+    (223, 0.834),
+    (224, 0.873),
+    (225, 0.828),
+    (226, 0.924),
+    (227, 0.924),
+    (228, 0.917),
+    (229, 0.93),
+    (230, 0.931),
+    (231, 0.463),
+    (232, 0.883),
+    (233, 0.836),
+    (234, 0.836),
+    (235, 0.867),
+    (236, 0.867),
+    (237, 0.696),
+    (238, 0.696),
+    (239, 0.874),
+    (183, 0.788),
+    (241, 0.874),
+    (242, 0.76),
+    (243, 0.946),
+    (244, 0.771),
+    (245, 0.865),
+    (246, 0.771),
+    (247, 0.888),
+    (248, 0.967),
+    (249, 0.888),
+    (250, 0.831),
+    (251, 0.873),
+    (252, 0.927),
+    (253, 0.97),
+    (183, 0.788),
+    (183, 0.788),
+)
+
+# ------------------------------------------------------------------------------
+# Glyph list for the built-in font 'Symbol'
+# ------------------------------------------------------------------------------
+symbol_glyphs = (
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (32, 0.25),
+    (33, 0.333),
+    (34, 0.713),
+    (35, 0.5),
+    (36, 0.549),
+    (37, 0.833),
+    (38, 0.778),
+    (39, 0.439),
+    (40, 0.333),
+    (41, 0.333),
+    (42, 0.5),
+    (43, 0.549),
+    (44, 0.25),
+    (45, 0.549),
+    (46, 0.25),
+    (47, 0.278),
+    (48, 0.5),
+    (49, 0.5),
+    (50, 0.5),
+    (51, 0.5),
+    (52, 0.5),
+    (53, 0.5),
+    (54, 0.5),
+    (55, 0.5),
+    (56, 0.5),
+    (57, 0.5),
+    (58, 0.278),
+    (59, 0.278),
+    (60, 0.549),
+    (61, 0.549),
+    (62, 0.549),
+    (63, 0.444),
+    (64, 0.549),
+    (65, 0.722),
+    (66, 0.667),
+    (67, 0.722),
+    (68, 0.612),
+    (69, 0.611),
+    (70, 0.763),
+    (71, 0.603),
+    (72, 0.722),
+    (73, 0.333),
+    (74, 0.631),
+    (75, 0.722),
+    (76, 0.686),
+    (77, 0.889),
+    (78, 0.722),
+    (79, 0.722),
+    (80, 0.768),
+    (81, 0.741),
+    (82, 0.556),
+    (83, 0.592),
+    (84, 0.611),
+    (85, 0.69),
+    (86, 0.439),
+    (87, 0.768),
+    (88, 0.645),
+    (89, 0.795),
+    (90, 0.611),
+    (91, 0.333),
+    (92, 0.863),
+    (93, 0.333),
+    (94, 0.658),
+    (95, 0.5),
+    (96, 0.5),
+    (97, 0.631),
+    (98, 0.549),
+    (99, 0.549),
+    (100, 0.494),
+    (101, 0.439),
+    (102, 0.521),
+    (103, 0.411),
+    (104, 0.603),
+    (105, 0.329),
+    (106, 0.603),
+    (107, 0.549),
+    (108, 0.549),
+    (109, 0.576),
+    (110, 0.521),
+    (111, 0.549),
+    (112, 0.549),
+    (113, 0.521),
+    (114, 0.549),
+    (115, 0.603),
+    (116, 0.439),
+    (117, 0.576),
+    (118, 0.713),
+    (119, 0.686),
+    (120, 0.493),
+    (121, 0.686),
+    (122, 0.494),
+    (123, 0.48),
+    (124, 0.2),
+    (125, 0.48),
+    (126, 0.549),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (183, 0.46),
+    (160, 0.25),
+    (161, 0.62),
+    (162, 0.247),
+    (163, 0.549),
+    (164, 0.167),
+    (165, 0.713),
+    (166, 0.5),
+    (167, 0.753),
+    (168, 0.753),
+    (169, 0.753),
+    (170, 0.753),
+    (171, 1.042),
+    (172, 0.713),
+    (173, 0.603),
+    (174, 0.987),
+    (175, 0.603),
+    (176, 0.4),
+    (177, 0.549),
+    (178, 0.411),
+    (179, 0.549),
+    (180, 0.549),
+    (181, 0.576),
+    (182, 0.494),
+    (183, 0.46),
+    (184, 0.549),
+    (185, 0.549),
+    (186, 0.549),
+    (187, 0.549),
+    (188, 1),
+    (189, 0.603),
+    (190, 1),
+    (191, 0.658),
+    (192, 0.823),
+    (193, 0.686),
+    (194, 0.795),
+    (195, 0.987),
+    (196, 0.768),
+    (197, 0.768),
+    (198, 0.823),
+    (199, 0.768),
+    (200, 0.768),
+    (201, 0.713),
+    (202, 0.713),
+    (203, 0.713),
+    (204, 0.713),
+    (205, 0.713),
+    (206, 0.713),
+    (207, 0.713),
+    (208, 0.768),
+    (209, 0.713),
+    (210, 0.79),
+    (211, 0.79),
+    (212, 0.89),
+    (213, 0.823),
+    (214, 0.549),
+    (215, 0.549),
+    (216, 0.713),
+    (217, 0.603),
+    (218, 0.603),
+    (219, 1.042),
+    (220, 0.987),
+    (221, 0.603),
+    (222, 0.987),
+    (223, 0.603),
+    (224, 0.494),
+    (225, 0.329),
+    (226, 0.79),
+    (227, 0.79),
+    (228, 0.786),
+    (229, 0.713),
+    (230, 0.384),
+    (231, 0.384),
+    (232, 0.384),
+    (233, 0.384),
+    (234, 0.384),
+    (235, 0.384),
+    (236, 0.494),
+    (237, 0.494),
+    (238, 0.494),
+    (239, 0.494),
+    (183, 0.46),
+    (241, 0.329),
+    (242, 0.274),
+    (243, 0.686),
+    (244, 0.686),
+    (245, 0.686),
+    (246, 0.384),
+    (247, 0.549),
+    (248, 0.384),
+    (249, 0.384),
+    (250, 0.384),
+    (251, 0.384),
+    (252, 0.494),
+    (253, 0.494),
+    (254, 0.494),
+    (183, 0.46),
+)
+
+
+class linkDest(object):
+    """link or outline destination details"""
+
+    def __init__(self, obj, rlink):
+        isExt = obj.isExternal
+        isInt = not isExt
+        self.dest = ""
+        self.fileSpec = ""
+        self.flags = 0
+        self.isMap = False
+        self.isUri = False
+        self.kind = LINK_NONE
+        self.lt = Point(0, 0)
+        self.named = ""
+        self.newWindow = ""
+        self.page = obj.page
+        self.rb = Point(0, 0)
+        self.uri = obj.uri
+        if rlink and not self.uri.startswith("#"):
+            self.uri = "#%i,%g,%g" % (rlink[0] + 1, rlink[1], rlink[2])
+        if obj.isExternal:
+            self.page = -1
+            self.kind = LINK_URI
+        if not self.uri:
+            self.page = -1
+            self.kind = LINK_NONE
+        if isInt and self.uri:
+            if self.uri.startswith("#"):
+                self.named = ""
+                self.kind = LINK_GOTO
+                ftab = self.uri[1:].split(",")
+                if len(ftab) == 3:
+                    self.page = int(ftab[0]) - 1
+                    self.lt = Point(float(ftab[1]), float(ftab[2]))
+                    self.flags = self.flags | LINK_FLAG_L_VALID | LINK_FLAG_T_VALID
+                else:
+                    try:
+                        self.page = int(ftab[0]) - 1
+                    except:
+                        self.kind = LINK_NAMED
+                        self.named = self.uri[1:]
+            else:
+                self.kind = LINK_NAMED
+                self.named = self.uri
+        if obj.isExternal:
+            if self.uri.startswith(("http://", "https://", "mailto:", "ftp://")):
+                self.isUri = True
+                self.kind = LINK_URI
+            elif self.uri.startswith("file://"):
+                self.fileSpec = self.uri[7:]
+                self.isUri = False
+                self.uri = ""
+                self.kind = LINK_LAUNCH
+                ftab = self.fileSpec.split("#")
+                if len(ftab) == 2:
+                    if ftab[1].startswith("page="):
+                        self.kind = LINK_GOTOR
+                        self.fileSpec = ftab[0]
+                        self.page = int(ftab[1][5:]) - 1
+            else:
+                self.isUri = True
+                self.kind = LINK_LAUNCH
+
+
+# -------------------------------------------------------------------------------
+# "Now" timestamp in PDF Format
+# -------------------------------------------------------------------------------
+def getPDFnow():
+    import time
+
+    tz = "%s'%s'" % (
+        str(abs(time.altzone // 3600)).rjust(2, "0"),
+        str((abs(time.altzone // 60) % 60)).rjust(2, "0"),
+    )
+    tstamp = time.strftime("D:%Y%m%d%H%M%S", time.localtime())
+    if time.altzone > 0:
+        tstamp += "-" + tz
+    elif time.altzone < 0:
+        tstamp += "+" + tz
+    else:
+        pass
+    return tstamp
+
+
+def getPDFstr(s):
+    """ Return a PDF string depending on its coding.
+
+    Notes:
+        Returns a string bracketed with either "()" or "<>" for hex values.
+        If only ascii then "(original)" is returned, else if only 8 bit chars
+        then "(original)" with interspersed octal strings \nnn is returned,
+        else a string "<FEFF[hexstring]>" is returned, where [hexstring] is the
+        UTF-16BE encoding of the original.
+    """
+    if not bool(s):
+        return "()"
+
+    def make_utf16be(s):
+        r = hexlify(bytearray([254, 255]) + bytearray(s, "UTF-16BE"))
+        t = r if fitz_py2 else r.decode()
+        return "<" + t + ">"  # brackets indicate hex
+
+    # The following either returns the original string with mixed-in
+    # octal numbers \nnn for chars outside the ASCII range, or returns
+    # the UTF-16BE BOM version of the string.
+    r = ""
+    for c in s:
+        oc = ord(c)
+        if oc > 255:  # shortcut if beyond 8-bit code range
+            return make_utf16be(s)
+
+        if oc > 31 and oc < 127:  # in ASCII range
+            if c in ("(", ")", "\\"):  # these need to be escaped
+                r += "\\"
+            r += c
+            continue
+
+        if oc > 127:  # beyond ASCII
+            r += "\\%03o" % oc
+            continue
+
+        # now the white spaces
+        if oc == 8:  # backspace
+            r += "\\b"
+        elif oc == 9:  # tab
+            r += "\\t"
+        elif oc == 10:  # line feed
+            r += "\\n"
+        elif oc == 12:  # form feed
+            r += "\\f"
+        elif oc == 13:  # carriage return
+            r += "\\r"
+        else:
+            r += "\\267"  # unsupported: replace by 0xB7
+
+    return "(" + r + ")"
+
+
+def getTJstr(text, glyphs, simple, ordering):
+    """ Return a PDF string enclosed in [] brackets, suitable for the PDF TJ
+    operator.
+
+    Notes:
+        The input string is converted to either 2 or 4 hex digits per character.
+    Args:
+        simple: no glyphs: 2-chars, use char codes as the glyph
+                glyphs: 2-chars, use glyphs instead of char codes (Symbol,
+                ZapfDingbats)
+        not simple: ordering < 0: 4-chars, use glyphs not char codes
+                    ordering >=0: a CJK font! 4 chars, use char codes as glyphs
+    """
+    if text.startswith("[<") and text.endswith(">]"):  # already done
+        return text
+
+    if not bool(text):
+        return "[<>]"
+
+    if simple:  # each char or its glyph is coded as a 2-byte hex
+        if glyphs is None:  # not Symbol, not ZapfDingbats: use char code
+            otxt = "".join(["%02x" % ord(c) if ord(c) < 256 else "b7" for c in text])
+        else:  # Symbol or ZapfDingbats: use glyphs
+            otxt = "".join(
+                ["%02x" % glyphs[ord(c)][0] if ord(c) < 256 else "b7" for c in text]
+            )
+        return "[<" + otxt + ">]"
+
+    # non-simple fonts: each char or its glyph is coded as 4-byte hex
+    if ordering < 0:  # not a CJK font: use the glyphs
+        otxt = "".join(["%04x" % glyphs[ord(c)][0] for c in text])
+    else:  # CJK: use the char codes
+        otxt = "".join(["%04x" % ord(c) for c in text])
+
+    return "[<" + otxt + ">]"
+
+
+"""
+Information taken from the following web sites:
+www.din-formate.de
+www.din-formate.info/amerikanische-formate.html
+www.directtools.de/wissen/normen/iso.htm
+"""
+paperSizes = {  # known paper formats @ 72 dpi
+    "a0": (2384, 3370),
+    "a1": (1684, 2384),
+    "a10": (74, 105),
+    "a2": (1191, 1684),
+    "a3": (842, 1191),
+    "a4": (595, 842),
+    "a5": (420, 595),
+    "a6": (298, 420),
+    "a7": (210, 298),
+    "a8": (147, 210),
+    "a9": (105, 147),
+    "b0": (2835, 4008),
+    "b1": (2004, 2835),
+    "b10": (88, 125),
+    "b2": (1417, 2004),
+    "b3": (1001, 1417),
+    "b4": (709, 1001),
+    "b5": (499, 709),
+    "b6": (354, 499),
+    "b7": (249, 354),
+    "b8": (176, 249),
+    "b9": (125, 176),
+    "c0": (2599, 3677),
+    "c1": (1837, 2599),
+    "c10": (79, 113),
+    "c2": (1298, 1837),
+    "c3": (918, 1298),
+    "c4": (649, 918),
+    "c5": (459, 649),
+    "c6": (323, 459),
+    "c7": (230, 323),
+    "c8": (162, 230),
+    "c9": (113, 162),
+    "card-4x6": (288, 432),
+    "card-5x7": (360, 504),
+    "commercial": (297, 684),
+    "executive": (522, 756),
+    "invoice": (396, 612),
+    "ledger": (792, 1224),
+    "legal": (612, 1008),
+    "legal-13": (612, 936),
+    "letter": (612, 792),
+    "monarch": (279, 540),
+    "tabloid-extra": (864, 1296),
+}
+
+
+def PaperSize(s):
+    """Return a tuple (width, height) for a given paper format string.
+    
+    Notes:
+        'A4-L' will return (842, 595), the values for A4 landscape.
+        Suffix '-P' and no suffix return the portrait tuple.
+    """
+    size = s.lower()
+    f = "p"
+    if size.endswith("-l"):
+        f = "l"
+        size = size[:-2]
+    if size.endswith("-p"):
+        size = size[:-2]
+    rc = paperSizes.get(size, (-1, -1))
+    if f == "p":
+        return rc
+    return (rc[1], rc[0])
+
+
+def PaperRect(s):
+    """Return a Rect for the paper size indicated in string 's'. Must conform to the argument of method 'PaperSize', which will be invoked.
+    """
+    width, height = PaperSize(s)
+    return Rect(0.0, 0.0, width, height)
+
+
+def CheckParent(o):
+    if not hasattr(o, "parent") or o.parent is None:
+        raise ValueError("orphaned object: parent is None")
+
+
+def CheckColor(c):
+    if c:
+        if (
+            type(c) not in (list, tuple)
+            or len(c) not in (1, 3, 4)
+            or min(c) < 0
+            or max(c) > 1
+        ):
+            raise ValueError("need 1, 3 or 4 color components in range 0 to 1")
+
+
+def ColorCode(c, f):
+    if not c:
+        return ""
+    if hasattr(c, "__float__"):
+        c = (c,)
+    CheckColor(c)
+    if len(c) == 1:
+        s = "%g " % c[0]
+        return s + "G " if f == "c" else s + "g "
+
+    if len(c) == 3:
+        s = "%g %g %g " % tuple(c)
+        return s + "RG " if f == "c" else s + "rg "
+
+    s = "%g %g %g %g " % tuple(c)
+    return s + "K " if f == "c" else s + "k "
+
+
+def JM_TUPLE(o):
+    return tuple(map(lambda x: round(x, 5) if abs(x) >= 1e-4 else 0, o))
+
+
+def CheckRect(r):
+    """Check whether an object is non-degenerate rect-like.
+
+    It must be a sequence of 4 numbers.
+    """
+    try:
+        if r.__len__() != 4:
+            return False
+        for i in range(len(r)):
+            a = float(r[i])
+    except:
+        return False
+
+    r = Rect(r)
+    return not (r.isEmpty or r.isInfinite)
+
+
+def CheckQuad(q):
+    """Check whether an object is convex, not empty  quad-like.
+
+    It must be a sequence of 4 number pairs.
+    """
+    try:
+        if q.__len__() != 4:
+            return False
+        for i in range(len(q)):
+            if q[i].__len__() != 2:
+                return False
+            a = float(q[i][0])
+            a = float(q[i][1])
+    except:
+        return False
+
+    return Quad(q).isConvex
+
+
+def CheckMarkerArg(quads):
+    if CheckRect(quads):
+        r = Rect(quads)
+        return (r.quad,)
+    if CheckQuad(quads):
+        return (quads,)
+    for q in quads:
+        if not (CheckRect(q) or CheckQuad(q)):
+            raise ValueError("bad quads entry")
+    return quads
+
+
+def CheckMorph(o):
+    if not bool(o):
+        return False
+    if not (type(o) in (list, tuple) and len(o) == 2):
+        raise ValueError("morph must be a sequence of length 2")
+    if not (len(o[0]) == 2 and len(o[1]) == 6):
+        raise ValueError("invalid morph parm 0")
+    if not o[1][4] == o[1][5] == 0:
+        raise ValueError("invalid morph parm 1")
+    return True
+
+
+def CheckFont(page, fontname):
+    """Return an entry in the page's font list if reference name matches.
+    """
+    for f in page.getFontList():
+        if f[4] == fontname:
+            return f
+        if f[3].lower() == fontname.lower():
+            return f
+
+
+def CheckFontInfo(doc, xref):
+    """Return a font info if present in the document.
+    """
+    for f in doc.FontInfos:
+        if xref == f[0]:
+            return f
+
+
+def UpdateFontInfo(doc, info):
+    xref = info[0]
+    found = False
+    for i, fi in enumerate(doc.FontInfos):
+        if fi[0] == xref:
+            found = True
+            break
+    if found:
+        doc.FontInfos[i] = info
+    else:
+        doc.FontInfos.append(info)
+
+
+def DUMMY(*args, **kw):
+    return
+
+
+def planishLine(p1, p2):
+    """Return matrix which flattens out the line from p1 to p2.
+
+    Args:
+        p1, p2: point_like
+    Returns:
+        Matrix which maps p1 to Point(0,0) and p2 to a point on the x axis at
+        the same distance to Point(0,0). Will always combine a rotation and a
+        transformation.
+    """
+    p1 = Point(p1)
+    p2 = Point(p2)
+    return Matrix(TOOLS._hor_matrix(p1, p2))
+
+
+def ImageProperties(img):
+    """ Return basic properties of an image.
+
+    Args:
+        img: bytes, bytearray, io.BytesIO object or an opened image file.
+    Returns:
+        A dictionary with keys width, height, colorspace.n, bpc, type, ext and size,
+        where 'type' is the MuPDF image type (0 to 14) and 'ext' the suitable
+        file extension.
+    """
+    if type(img) is io.BytesIO:
+        stream = img.getvalue()
+    elif hasattr(img, "read"):
+        stream = img.read()
+    elif type(img) in (bytes, bytearray):
+        stream = img
+    else:
+        raise ValueError("bad argument 'img'")
+
+    return TOOLS.image_profile(stream)
+
+
+def ConversionHeader(i, filename="unknown"):
+    t = i.lower()
+    html = """<!DOCTYPE html>
+<html>
+<head>
+<style>
+body{background-color:gray}
+div{position:relative;background-color:white;margin:1em auto}
+p{position:absolute;margin:0}
+img{position:absolute}
+</style>
+</head>
+<body>\n"""
+
+    xml = (
+        """<?xml version="1.0"?>
+<document name="%s">\n"""
+        % filename
+    )
+
+    xhtml = """<?xml version="1.0"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<style>
+body{background-color:gray}
+div{background-color:white;margin:1em;padding:1em}
+p{white-space:pre-wrap}
+</style>
+</head>
+<body>\n"""
+
+    text = ""
+    json = '{"document": "%s", "pages": [\n' % filename
+    if t == "html":
+        r = html
+    elif t == "json":
+        r = json
+    elif t == "xml":
+        r = xml
+    elif t == "xhtml":
+        r = xhtml
+    else:
+        r = text
+
+    return r
+
+
+def ConversionTrailer(i):
+    t = i.lower()
+    text = ""
+    json = "]\n}"
+    html = "</body>\n</html>\n"
+    xml = "</document>\n"
+    xhtml = html
+    if t == "html":
+        r = html
+    elif t == "json":
+        r = json
+    elif t == "xml":
+        r = xml
+    elif t == "xhtml":
+        r = xhtml
+    else:
+        r = text
+
+    return r
+
+
+def DerotateRect(cropbox, rect, deg):
+    """Calculate the non-rotated rect version.
+
+    Args:
+        cropbox: the page's /CropBox
+        rect: rectangle
+        deg: the page's /Rotate value
+    Returns:
+        Rectangle in original (/CropBox) coordinates
+    """
+    while deg < 0:
+        deg += 360
+    while deg >= 360:
+        deg -= 360
+    if deg % 90 > 0:
+        deg = 0
+    if deg == 0:  # no rotation: no-op
+        return rect
+    points = []  # store the new rect points here
+    for p in rect.quad:  # run through the rect's quad points
+        if deg == 90:
+            q = (p.y, cropbox.height - p.x)
+        elif deg == 270:
+            q = (cropbox.width - p.y, p.x)
+        else:
+            q = (cropbox.width - p.x, cropbox.height - p.y)
+        points.append(q)
+
+    r = Rect(points[0], points[0])
+    for p in points[1:]:
+        r |= p
+    return r
+
+
+def get_highlight_selection(page, start=None, stop=None, clip=None):
+    """Return rectangles of text lines between two points.
+
+    Notes:
+        The default of 'start' is top-left of 'clip'. The default of 'stop'
+        is bottom-reight of 'clip'.
+
+    Args:
+        start: start point_like
+        stop: end point_like, must be 'below' start
+        clip: consider this rect_like only, default is page rectangle
+    Returns:
+        List of line bbox intersections with the area established by the
+        parameters.
+    """
+    # validate and normalize arguments
+    if clip is None:
+        clip = page.rect
+    clip = Rect(clip)
+    if start is None:
+        start = clip.tl
+    start = Point(start)
+    if stop is None:
+        stop = clip.br
+    stop = Point(stop)
+
+    # extract text of page (no images)
+    blocks = page.getText(
+        "dict", flags=TEXT_PRESERVE_LIGATURES + TEXT_PRESERVE_WHITESPACE
+    )["blocks"]
+    rectangles = []  # we will be returning this
+    lines = []  # intermediate bbox store
+    for b in blocks:
+        for line in b["lines"]:
+            bbox = Rect(line["bbox"]) & clip  # line bbox intersection
+            if bbox.isEmpty:  # do not output empty rectangles
+                continue
+            if bbox.y0 < start.y or bbox.y1 > stop.y:
+                continue  # line above or below the selection points
+            lines.append(bbox)
+
+    if lines == []:  # we did not select anything
+        return rectangles
+
+    lines.sort(key=lambda bbox: bbox.y0)  # sort result by vertical positions
+
+    bboxf = lines[0]  # potentially cut off left part of first line
+    if bboxf.y0 - start.y <= 0.1 * bboxf.height:  # close enough to the top?
+        r = Rect(start.x, bboxf.y0, bboxf.br)  # intersection rectangle
+        if r.isEmpty or r.isInfinite:
+            bboxf = Rect()  # first line will be skipped
+        else:
+            bboxf &= r
+
+    if len(lines) > 1:  # if we selected 2 or more lines
+        if not bboxf.isEmpty:
+            rectangles.append(bboxf)  # output bbox of first line
+        bboxl = lines[-1]  # and read last line
+    else:
+        bboxl = bboxf  # further restrict the only line selected
+
+    if stop.y - bboxl.y1 <= 0.1 * bboxl.height:  # close enough to bottom?
+        r = Rect(bboxl.tl, stop.x, bboxl.y1)  # intersection rectangle
+        if r.isEmpty or r.isInfinite:  # last line will be skipped
+            bboxl = Rect()
+        else:
+            bboxl &= r
+
+    if not bboxl.isEmpty:
+        rectangles.append(bboxl)
+
+    for bbox in lines[1:-1]:  # now add remaining line bboxes
+        rectangles.append(bbox)
+
+    return rectangles
+
+
+def annot_preprocess(page):
+    """Prepare for annotation insertion on the page.
+
+    Returns:
+        Old page rotation value. Temporarily sets rotation to 0 when required.
+    """
+    CheckParent(page)
+    if not page.parent.isPDF:
+        raise ValueError("not a PDF")
+    old_rotation = page.rotation
+    if old_rotation != 0:
+        page.setRotation(0)
+    return old_rotation
+
+
+def annot_postprocess(page, annot):
+    """Clean up after annotation inertion.
+
+    Set ownership flag and store annotation in page annotation dictionary.
+    """
+    annot.parent = weakref.proxy(page)
+    page._annot_refs[id(annot)] = annot
+    annot.thisown = True
+
+
+def sRGB_to_pdf(srgb):
+    """Convert sRGB color code to PDF color triple.
+
+    There is **no error checking** for performance reasons!
+
+    Args:
+        srgb: (int) RRGGBB (red, green, blue), each color in range(255).
+    Returns:
+        Tuple (red, green, blue) each item in intervall 0 <= item <= 1.
+    """
+    r = srgb >> 16
+    g = (srgb - (r << 16)) >> 8
+    b = srgb - (r << 16) - (g << 8)
+    return (r / 255.0, g / 255.0, b / 255.0)
+
+
+def make_table(rect=(0, 0, 1, 1), cols=1, rows=1):
+    """Return a list of (rows x cols) equal sized rectangles.
+
+    Notes:
+        A utility to fill a given area with table cells of equal size.
+    Args:
+        rect: rect_like to use as the table area
+        rows: number of rows
+        cols: number of columns
+    Returns:
+        A list with <rows> items, where each item is a list of <cols>
+        PyMuPDF Rect objects of equal sizes.
+    """
+    rect = Rect(rect)  # ensure this is a Rect
+    if rect.isEmpty or rect.isInfinite:
+        raise ValueError("rect must be finite and not empty")
+    tl = rect.tl
+
+    height = rect.height / rows  # height of one table cell
+    width = rect.width / cols  # width of one table cell
+    delta_h = (width, 0, width, 0)  # diff to next right rect
+    delta_v = (0, height, 0, height)  # diff to next lower rect
+
+    r = Rect(tl, tl.x + width, tl.y + height)  # first rectangle
+
+    # make the first row
+    row = [r]
+    for i in range(1, cols):
+        r += delta_h  # build next rect to the right
+        row.append(r)
+
+    # make result, starts with first row
+    rects = [row]
+    for i in range(1, rows):
+        row = rects[i - 1]  # take previously appended row
+        nrow = []  # the new row to append
+        for r in row:  # for each previous cell add its downward copy
+            nrow.append(r + delta_v)
+        rects.append(nrow)  # append new row to result
+
+    return rects
+
+
+def repair_mono_font(page, font):
+    """Repair character spacing for mono fonts.
+
+    Notes:
+        Some mono-spaced fonts are displayed with a too large character
+        distance, e.g. "a b c" instead of "abc". This utility adds an entry
+        "/W[0 65532 w]" to the descendent font(s) of font.
+        This should enforce viewers to use 'w' as the character width.
+
+    Args:
+        page: fitz.Page object.
+        font: fitz.Font object.
+    """
+    if not font.flags["mono"]:
+        return None
+    doc = page.parent
+    fontlist = page.getFontList()  # list of fonts on page
+    xrefs = [  # list of objects referring to font
+        f[0]
+        for f in fontlist
+        if (f[3] == font.name and f[4].startswith("F") and f[5].startswith("Identity"))
+    ]
+    if xrefs == []:  # our font does not occur
+        return
+    xrefs = set(xrefs)  # drop any double counts
+    width = int(font.glyph_advance(32) * 1000)
+    for xref in xrefs:
+        if not TOOLS.set_font_width(doc, xref, width):
+            print("Could set width for '%s' in xref %i" % (font.name, xref))
+%}
diff --git a/fitz/helper-select.i b/fitz/helper-select.i

new file mode 100644 (file)

index 0000000..d403a30
--- /dev/null
+++ b/fitz/helper-select.i
@@ -0,0 +1,374 @@
+%{
+//----------------------------------------------------------------------------
+// Helpers for document page selection - main logic was imported
+// from pdf_clean_file.c. But instead of analyzing a string-based spec of
+// selected pages, we accept a Python sequence.
+//----------------------------------------------------------------------------
+typedef struct globals_s
+{
+    pdf_document *doc;
+    fz_context *ctx;
+} globals;
+
+int string_in_names_list(fz_context *ctx, pdf_obj *p, pdf_obj *names_list)
+{
+    int n = pdf_array_len(ctx, names_list);
+    int i;
+    const char *str = pdf_to_text_string(ctx, p);
+
+    for (i = 0; i < n ; i += 2)
+    {
+        if (!strcmp(pdf_to_text_string(ctx, pdf_array_get(ctx, names_list, i)), str))
+            return 1;
+    }
+    return 0;
+}
+
+//----------------------------------------------------------------------------
+// Recreate page tree to only retain specified pages.
+//----------------------------------------------------------------------------
+void retainpage(fz_context *ctx, pdf_document *doc, pdf_obj *parent, pdf_obj *kids, int page)
+{
+    pdf_obj *pageref = pdf_lookup_page_obj(ctx, doc, page);
+
+    pdf_flatten_inheritable_page_items(ctx, pageref);
+
+    pdf_dict_put(ctx, pageref, PDF_NAME(Parent), parent);
+
+    /* Store page object in new kids array */
+    pdf_array_push(ctx, kids, pageref);
+}
+
+int dest_is_valid_page(fz_context *ctx, pdf_obj *obj, int *page_object_nums, int pagecount)
+{
+    int i;
+    int num = pdf_to_num(ctx, obj);
+
+    if (num == 0)
+        return 0;
+    for (i = 0; i < pagecount; i++)
+    {
+        if (page_object_nums[i] == num)
+            return 1;
+    }
+    return 0;
+}
+
+int dest_is_valid(fz_context *ctx, pdf_obj *o, int page_count, int *page_object_nums, pdf_obj *names_list)
+{
+    pdf_obj *p;
+
+    p = pdf_dict_get(ctx, o, PDF_NAME(A));
+    if (pdf_name_eq(ctx, pdf_dict_get(ctx, p, PDF_NAME(S)), PDF_NAME(GoTo)) &&
+        !string_in_names_list(ctx, pdf_dict_get(ctx, p, PDF_NAME(D)), names_list))
+        return 0;
+
+    p = pdf_dict_get(ctx, o, PDF_NAME(Dest));
+    if (p == NULL)
+    {}
+    else if (pdf_is_string(ctx, p))
+    {
+        return string_in_names_list(ctx, p, names_list);
+    }
+    else if (!dest_is_valid_page(ctx, pdf_array_get(ctx, p, 0), page_object_nums, page_count))
+        return 0;
+
+    return 1;
+}
+
+int strip_outlines(fz_context *ctx, pdf_document *doc, pdf_obj *outlines, int page_count, int *page_object_nums, pdf_obj *names_list);
+
+int strip_outline(fz_context *ctx, pdf_document *doc, pdf_obj *outlines, int page_count, int *page_object_nums, pdf_obj *names_list, pdf_obj **pfirst, pdf_obj **plast)
+{
+    pdf_obj *prev = NULL;
+    pdf_obj *first = NULL;
+    pdf_obj *current;
+    int count = 0;
+
+    for (current = outlines; current != NULL; )
+    {
+        int nc;
+
+        /*********************************************************************/
+        // Strip any children to start with. This takes care of
+        // First / Last / Count for us.
+        /*********************************************************************/
+        nc = strip_outlines(ctx, doc, current, page_count, page_object_nums, names_list);
+
+        if (!dest_is_valid(ctx, current, page_count, page_object_nums, names_list))
+        {
+            if (nc == 0)
+            {
+                /*************************************************************/
+                // Outline with invalid dest and no children. Drop it by
+                // pulling the next one in here.
+                /*************************************************************/
+                pdf_obj *next = pdf_dict_get(ctx, current, PDF_NAME(Next));
+                if (next == NULL)
+                {
+                    // There is no next one to pull in
+                    if (prev != NULL)
+                        pdf_dict_del(ctx, prev, PDF_NAME(Next));
+                }
+                else if (prev != NULL)
+                {
+                    pdf_dict_put(ctx, prev, PDF_NAME(Next), next);
+                    pdf_dict_put(ctx, next, PDF_NAME(Prev), prev);
+                }
+                else
+                {
+                    pdf_dict_del(ctx, next, PDF_NAME(Prev));
+                }
+                current = next;
+            }
+            else
+            {
+                // Outline with invalid dest, but children. Just drop the dest.
+                pdf_dict_del(ctx, current, PDF_NAME(Dest));
+                pdf_dict_del(ctx, current, PDF_NAME(A));
+                current = pdf_dict_get(ctx, current, PDF_NAME(Next));
+            }
+        }
+        else
+        {
+            // Keep this one
+            if (first == NULL)
+                first = current;
+            prev = current;
+            current = pdf_dict_get(ctx, current, PDF_NAME(Next));
+            count++;
+        }
+    }
+
+    *pfirst = first;
+    *plast = prev;
+
+    return count;
+}
+
+int strip_outlines(fz_context *ctx, pdf_document *doc, pdf_obj *outlines, int page_count, int *page_object_nums, pdf_obj *names_list)
+{
+    int nc;
+    pdf_obj *first;
+    pdf_obj *last;
+
+    if (outlines == NULL)
+        return 0;
+
+    first = pdf_dict_get(ctx, outlines, PDF_NAME(First));
+    if (first == NULL)
+        nc = 0;
+    else
+        nc = strip_outline(ctx, doc, first, page_count, page_object_nums,
+                           names_list, &first, &last);
+
+    if (nc == 0)
+    {
+        pdf_dict_del(ctx, outlines, PDF_NAME(First));
+        pdf_dict_del(ctx, outlines, PDF_NAME(Last));
+        pdf_dict_del(ctx, outlines, PDF_NAME(Count));
+    }
+    else
+    {
+        int old_count = pdf_to_int(ctx, pdf_dict_get(ctx, outlines, PDF_NAME(Count)));
+        pdf_dict_put(ctx, outlines, PDF_NAME(First), first);
+        pdf_dict_put(ctx, outlines, PDF_NAME(Last), last);
+        pdf_dict_put_drop(ctx, outlines, PDF_NAME(Count), pdf_new_int(ctx, old_count > 0 ? nc : -nc));
+    }
+    return nc;
+}
+
+//----------------------------------------------------------------------------
+//   This is called by PyMuPDF:
+//   liste = page numbers to retain
+//----------------------------------------------------------------------------
+void retainpages(fz_context *ctx, globals *glo, PyObject *liste)
+{
+    pdf_obj *oldroot, *root, *pages, *kids, *countobj, *olddests;
+    Py_ssize_t argc = PySequence_Size(liste);
+    pdf_document *doc = glo->doc;
+    pdf_obj *names_list = NULL;
+    pdf_obj *outlines;
+    pdf_obj *ocproperties;
+    int pagecount = pdf_count_pages(ctx, doc);
+
+    int i;
+    int *page_object_nums;
+
+/******************************************************************************/
+//    Keep only pages/type and (reduced) dest entries to avoid
+//    references to dropped pages
+/******************************************************************************/
+    oldroot = pdf_dict_get(ctx, pdf_trailer(ctx, doc), PDF_NAME(Root));
+    pages = pdf_dict_get(ctx, oldroot, PDF_NAME(Pages));
+    olddests = pdf_load_name_tree(ctx, doc, PDF_NAME(Dests));
+    outlines = pdf_dict_get(ctx, oldroot, PDF_NAME(Outlines));
+    ocproperties = pdf_dict_get(ctx, oldroot, PDF_NAME(OCProperties));
+
+    root = pdf_new_dict(ctx, doc, 3);
+    pdf_dict_put(ctx, root, PDF_NAME(Type), pdf_dict_get(ctx, oldroot, PDF_NAME(Type)));
+    pdf_dict_put(ctx, root, PDF_NAME(Pages), pdf_dict_get(ctx, oldroot, PDF_NAME(Pages)));
+    if (outlines)
+        pdf_dict_put(ctx, root, PDF_NAME(Outlines), outlines);
+    if (ocproperties)
+        pdf_dict_put(ctx, root, PDF_NAME(OCProperties), ocproperties);
+
+    pdf_update_object(ctx, doc, pdf_to_num(ctx, oldroot), root);
+
+    // Create a new kids array with only the pages we want to keep
+    kids = pdf_new_array(ctx, doc, 1);
+
+    // Retain pages specified
+    Py_ssize_t page;
+    fz_try(ctx) {
+        for (page = 0; page < argc; page++) {
+            i = (int) PyInt_AsLong(PySequence_ITEM(liste, page));
+            if (i < 0 || i >= pagecount)
+                THROWMSG("invalid page number(s)");
+            retainpage(ctx, doc, pages, kids, i);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+
+    // Update page count and kids array
+    countobj = pdf_new_int(ctx, pdf_array_len(ctx, kids));
+    pdf_dict_put_drop(ctx, pages, PDF_NAME(Count), countobj);
+    pdf_dict_put_drop(ctx, pages, PDF_NAME(Kids), kids);
+
+    pagecount = pdf_count_pages(ctx, doc);
+    page_object_nums = fz_calloc(ctx, pagecount, sizeof(*page_object_nums));
+    for (i = 0; i < pagecount; i++)
+    {
+        pdf_obj *pageref = pdf_lookup_page_obj(ctx, doc, i);
+        page_object_nums[i] = pdf_to_num(ctx, pageref);
+    }
+
+/******************************************************************************/
+// If we had an old Dests tree (now reformed as an olddests dictionary),
+// keep any entries in there that point to valid pages.
+// This may mean we keep more than we need, but it is safe at least.
+/******************************************************************************/
+    if (olddests)
+    {
+        pdf_obj *names = pdf_new_dict(ctx, doc, 1);
+        pdf_obj *dests = pdf_new_dict(ctx, doc, 1);
+        int len = pdf_dict_len(ctx, olddests);
+
+        names_list = pdf_new_array(ctx, doc, 32);
+
+        for (i = 0; i < len; i++)
+        {
+            pdf_obj *key = pdf_dict_get_key(ctx, olddests, i);
+            pdf_obj *val = pdf_dict_get_val(ctx, olddests, i);
+            pdf_obj *dest = pdf_dict_get(ctx, val, PDF_NAME(D));
+
+            dest = pdf_array_get(ctx, dest ? dest : val, 0);
+            if (dest_is_valid_page(ctx, dest, page_object_nums, pagecount))
+            {
+                pdf_obj *key_str = pdf_new_string(ctx, pdf_to_name(ctx, key), strlen(pdf_to_name(ctx, key)));
+                pdf_array_push_drop(ctx, names_list, key_str);
+                pdf_array_push(ctx, names_list, val);
+            }
+        }
+
+        pdf_dict_put(ctx, dests, PDF_NAME(Names), names_list);
+        pdf_dict_put(ctx, names, PDF_NAME(Dests), dests);
+        pdf_dict_put(ctx, root, PDF_NAME(Names), names);
+
+        pdf_drop_obj(ctx, names);
+        pdf_drop_obj(ctx, dests);
+        pdf_drop_obj(ctx, olddests);
+    }
+
+/*****************************************************************************/
+// Edit each pages /Annot list to remove any links pointing to nowhere.
+/*****************************************************************************/
+    for (i = 0; i < pagecount; i++)
+    {
+        pdf_obj *pageref = pdf_lookup_page_obj(ctx, doc, i);
+
+        pdf_obj *annots = pdf_dict_get(ctx, pageref, PDF_NAME(Annots));
+
+        int len = pdf_array_len(ctx, annots);
+        int j;
+
+        for (j = 0; j < len; j++)
+        {
+            pdf_obj *o = pdf_array_get(ctx, annots, j);
+
+            if (!pdf_name_eq(ctx, pdf_dict_get(ctx, o, PDF_NAME(Subtype)), PDF_NAME(Link)))
+                continue;
+
+            if (!dest_is_valid(ctx, o, pagecount, page_object_nums, names_list))
+            {
+                // Remove this annotation
+                pdf_array_delete(ctx, annots, j);
+                len--;
+                j--;
+            }
+        }
+    }
+
+    if (strip_outlines(ctx, doc, outlines, pagecount, page_object_nums, names_list) == 0)
+    {
+        pdf_dict_del(ctx, root, PDF_NAME(Outlines));
+    }
+
+    fz_free(ctx, page_object_nums);
+    pdf_drop_obj(ctx, names_list);
+    pdf_drop_obj(ctx, root);
+}
+
+PyObject *remove_dest_range(fz_context *ctx, pdf_document *pdf, int first, int last)
+{
+    int i, pno, pagecount = pdf_count_pages(ctx, pdf);
+    if (!INRANGE(first, 0, pagecount-1) ||
+        !INRANGE(last, 0, pagecount-1) ||
+        (first > last))
+        Py_RETURN_NONE;
+    fz_try(ctx) {
+        for (i = 0; i < pagecount; i++) {
+            if (INRANGE(i, first, last)) continue;
+
+            pdf_obj *pageref = pdf_lookup_page_obj(ctx, pdf, i);
+            pdf_obj *annots = pdf_dict_get(ctx, pageref, PDF_NAME(Annots));
+            pdf_obj *target;
+            if (!annots) continue;
+            int len = pdf_array_len(ctx, annots);
+            int j;
+            for (j = len - 1; j >= 0; j -= 1) {
+                pdf_obj *o = pdf_array_get(ctx, annots, j);
+                if (!pdf_name_eq(ctx, pdf_dict_get(ctx, o, PDF_NAME(Subtype)), PDF_NAME(Link)))
+                    continue;
+                pdf_obj *action = pdf_dict_get(ctx, o, PDF_NAME(A));
+                pdf_obj *dest =  pdf_dict_get(ctx, o, PDF_NAME(Dest));
+                if (action) {
+                    if (!pdf_name_eq(ctx, pdf_dict_get(ctx, action,
+                        PDF_NAME(S)), PDF_NAME(GoTo)))
+                        continue;
+                    dest = pdf_dict_get(ctx, action, PDF_NAME(D));
+                }
+                pno = -1;
+                if (pdf_is_array(ctx, dest)) {
+                    target = pdf_array_get(ctx, dest, 0);
+                    pno = pdf_lookup_page_number(ctx, pdf, target);
+                }
+                else if (pdf_is_string(ctx, dest)) {
+                    pno = pdf_lookup_anchor(ctx, pdf,
+                                            pdf_to_text_string(ctx, dest),
+                                            NULL, NULL);
+                }
+                if (INRANGE(pno, first, last)) {
+                    pdf_array_delete(ctx, annots, j);
+                }
+            }
+        }
+    }
+    fz_catch(ctx) {
+        return NULL;
+    }
+    Py_RETURN_NONE;
+}
+%}
diff --git a/fitz/helper-stext.i b/fitz/helper-stext.i

new file mode 100644 (file)

index 0000000..9b515c6
--- /dev/null
+++ b/fitz/helper-stext.i
@@ -0,0 +1,581 @@
+%{
+//-----------------------------------------------------------------------------
+// Make a text page directly from an fz_page
+//-----------------------------------------------------------------------------
+fz_stext_page *JM_new_stext_page_from_page(fz_context *ctx, fz_page *page, int flags)
+{
+    if (!page) return NULL;
+    fz_stext_page *tp = NULL;
+    fz_rect rect;
+    fz_device *dev = NULL;
+    fz_var(dev);
+    fz_var(tp);
+    fz_stext_options options = { 0 };
+    options.flags = flags;
+    fz_try(ctx) {
+        rect = fz_bound_page(ctx, page);
+        tp = fz_new_stext_page(ctx, rect);
+        dev = fz_new_stext_device(ctx, tp, &options);
+        fz_run_page_contents(ctx, page, dev, fz_identity, NULL);
+        fz_close_device(ctx, dev);
+    }
+    fz_always(ctx) {
+        fz_drop_device(ctx, dev);
+    }
+    fz_catch(ctx) {
+        fz_drop_stext_page(ctx, tp);
+        fz_rethrow(ctx);
+    }
+    return tp;
+}
+
+
+//-----------------------------------------------------------------------------
+// Replace MuPDF error rune with character 0xB7
+//-----------------------------------------------------------------------------
+PyObject *JM_repl_char()
+{
+    const char data[2] = {194, 183};
+    return PyUnicode_FromStringAndSize(data, 2);
+}
+
+//-----------------------------------------------------------------------------
+// APPEND non-ascii runes in unicode escape format to a fz_buffer
+//-----------------------------------------------------------------------------
+void JM_append_rune(fz_context *ctx, fz_buffer *buff, int ch)
+{
+    if (ch >= 32 && ch <= 127)
+    {
+        fz_append_byte(ctx, buff, ch);
+    }
+    else if (ch <= 0xffff)  // 4 hex digits
+    {
+        fz_append_printf(ctx, buff, "\\u%04x", ch);
+    }
+    else  // 8 hex digits
+    {
+        fz_append_printf(ctx, buff, "\\U%08x", ch);
+    }
+}
+
+
+//-----------------------------------------------------------------------------
+// WRITE non-ascii runes in unicode escape format to a fz_output
+//-----------------------------------------------------------------------------
+void JM_write_rune(fz_context *ctx, fz_output *out, int ch)
+{
+    if (ch >= 32 && ch <= 127)
+    {
+        fz_write_byte(ctx, out, ch);
+    }
+    else if (ch <= 0xffff)  // 4 hex digits
+    {
+        fz_write_printf(ctx, out, "\\u%04x", ch);
+    }
+    else  // 8 hex digits
+    {
+        fz_write_printf(ctx, out, "\\U%08x", ch);
+    }
+}
+
+
+//-----------------------------------------------------------------------------
+// Plain text output. An identical copy of fz_print_stext_page_as_text,
+// but lines within a block are concatenated by space instead a new-line
+// character (which else leads to 2 new-lines).
+//-----------------------------------------------------------------------------
+void
+JM_print_stext_page_as_text(fz_context *ctx, fz_output *out, fz_stext_page *page)
+{
+    fz_stext_block *block = NULL;
+    fz_stext_line *line = NULL;
+    fz_stext_char *ch = NULL;
+    int last_char = 0;
+
+    for (block = page->first_block; block; block = block->next)
+    {
+        if (block->type == FZ_STEXT_BLOCK_TEXT)
+        {
+            int line_n = 0;
+            for (line = block->u.t.first_line; line; line = line->next)
+            {
+                if (line_n > 0 && last_char != 10)
+                {
+                    fz_write_string(ctx, out, "\n");
+                }
+                line_n++;
+                for (ch = line->first_char; ch; ch = ch->next)
+                {
+                    JM_write_rune(ctx, out, ch->c);
+                    last_char = ch->c;
+                }
+            }
+            fz_write_string(ctx, out, "\n");
+        }
+    }
+}
+
+//-----------------------------------------------------------------------------
+// Functions for wordlist output
+//-----------------------------------------------------------------------------
+int JM_append_word(fz_context *ctx, PyObject *lines, fz_buffer *buff, fz_rect *wbbox,
+                   int block_n, int line_n, int word_n)
+{
+    PyObject *s = JM_EscapeStrFromBuffer(ctx, buff);
+    PyObject *litem = Py_BuildValue("ffffOiii",
+                                    wbbox->x0,
+                                    wbbox->y0,
+                                    wbbox->x1,
+                                    wbbox->y1,
+                                    s,
+                                    block_n, line_n, word_n);
+    LIST_APPEND_DROP(lines, litem);
+    Py_DECREF(s);
+    wbbox->x0 = wbbox->y0 = wbbox->x1 = wbbox->y1 = 0;
+    return word_n + 1;                 // word counter
+}
+
+//-----------------------------------------------------------------------------
+// Functions for dictionary output
+//-----------------------------------------------------------------------------
+
+// create the char rect from its quad
+fz_rect JM_char_bbox(fz_stext_line *line, fz_stext_char *ch)
+{
+    fz_rect r = fz_rect_from_quad(ch->quad);
+    if (!fz_is_empty_rect(r)) return r;
+    // we need to correct erroneous font!
+    if ((r.y1 - r.y0) <= FLT_EPSILON) r.y0 = r.y1 - ch->size;
+    if ((r.x1 - r.x0) <= FLT_EPSILON) r.x0 = r.x1 - ch->size;
+    return r;
+}
+
+static int detect_super_script(fz_stext_line *line, fz_stext_char *ch)
+{
+    if (line->wmode == 0 && line->dir.x == 1 && line->dir.y == 0)
+        return ch->origin.y < line->first_char->origin.y - ch->size * 0.1f;
+    return 0;
+}
+
+static int JM_char_font_flags(fz_context *ctx, fz_font *font, fz_stext_line *line, fz_stext_char *ch)
+{
+    int flags = detect_super_script(line, ch);
+    flags += fz_font_is_italic(ctx, font) * TEXT_FONT_ITALIC;
+    flags += fz_font_is_serif(ctx, font) * TEXT_FONT_SERIFED;
+    flags += fz_font_is_monospaced(ctx, font) * TEXT_FONT_MONOSPACED;
+    flags += fz_font_is_bold(ctx, font) * TEXT_FONT_BOLD;
+    return flags;
+}
+
+
+static PyObject *JM_make_spanlist(fz_context *ctx, fz_stext_line *line, int raw, fz_buffer *buff)
+{
+    PyObject *span = NULL, *char_list = NULL, *char_dict;
+    PyObject *span_list = PyList_New(0);
+    fz_clear_buffer(ctx, buff);
+    fz_stext_char *ch;
+    fz_rect span_rect;
+    typedef struct style_s
+    {float size; int flags; char *font; int color;} char_style;
+
+    char_style old_style = { -1, -1, "", -1 }, style;
+
+    for (ch = line->first_char; ch; ch = ch->next)
+    {
+        fz_rect r = JM_char_bbox(line, ch);
+        int flags = JM_char_font_flags(ctx, ch->font, line, ch);
+        style.size = ch->size;
+        style.flags = flags;
+        style.font = (char *) fz_font_name(ctx, ch->font);
+        style.color = ch->color;
+
+        if (style.size != old_style.size ||
+            style.flags != old_style.flags ||
+            style.color != old_style.color ||
+            strcmp(style.font, old_style.font) != 0)  // changed -> new span
+        {
+            if (old_style.size >= 0)  // not 1st one, output previous span
+            {
+                if (raw)  // put character list in the span
+                {
+                    DICT_SETITEM_DROP(span, dictkey_chars, char_list);
+                    char_list = NULL;
+                }
+                else  // put text string in the span
+                {
+                    DICT_SETITEM_DROP(span, dictkey_text, JM_EscapeStrFromBuffer(ctx, buff));
+                    fz_clear_buffer(ctx, buff);
+                }
+
+                DICT_SETITEM_DROP(span, dictkey_bbox, JM_py_from_rect(span_rect));
+
+                LIST_APPEND_DROP(span_list, span);
+                span = NULL;
+            }
+
+            span = PyDict_New();
+
+            DICT_SETITEM_DROP(span, dictkey_size, Py_BuildValue("f", style.size));
+            DICT_SETITEM_DROP(span, dictkey_flags, Py_BuildValue("i", style.flags));
+            DICT_SETITEM_DROP(span, dictkey_font, JM_EscapeStrFromStr(style.font));
+            DICT_SETITEM_DROP(span, dictkey_color, Py_BuildValue("i", style.color));
+
+            old_style = style;
+            span_rect = r;
+        }
+        span_rect = fz_union_rect(span_rect, r);
+        if (raw)  // make and append a char dict
+        {
+            char_dict = PyDict_New();
+
+            DICT_SETITEM_DROP(char_dict, dictkey_origin,
+                          Py_BuildValue("ff", ch->origin.x, ch->origin.y));
+
+            DICT_SETITEM_DROP(char_dict, dictkey_bbox,
+                          Py_BuildValue("ffff", r.x0, r.y0, r.x1, r.y1));
+
+            DICT_SETITEM_DROP(char_dict, dictkey_c,
+                          Py_BuildValue("C", ch->c));
+
+            if (!char_list)
+            {
+                char_list = PyList_New(0);
+            }
+            LIST_APPEND_DROP(char_list, char_dict);
+        }
+        else  // add character byte to buffer
+        {
+            JM_append_rune(ctx, buff, ch->c);
+        }
+    }
+    // all characters processed, now flush remaining span
+    if (span)
+    {
+        if (raw)
+        {
+            DICT_SETITEM_DROP(span, dictkey_chars, char_list);
+            char_list = NULL;
+        }
+        else
+        {
+            DICT_SETITEM_DROP(span, dictkey_text, JM_EscapeStrFromBuffer(ctx, buff));
+            fz_clear_buffer(ctx, buff);
+        }
+        DICT_SETITEM_DROP(span, dictkey_bbox, JM_py_from_rect(span_rect));
+
+        LIST_APPEND_DROP(span_list, span);
+        span = NULL;
+    }
+    return span_list;
+}
+
+static void JM_make_image_block(fz_context *ctx, fz_stext_block *block, PyObject *block_dict)
+{
+    fz_image *image = block->u.i.image;
+    fz_buffer *buf = NULL, *freebuf = NULL;
+    fz_compressed_buffer *buffer = fz_compressed_image_buffer(ctx, image);
+    fz_var(buf);
+    fz_var(freebuf);
+    int n = fz_colorspace_n(ctx, image->colorspace);
+    int w = image->w;
+    int h = image->h;
+    const char *ext = NULL;
+    int type = FZ_IMAGE_UNKNOWN;
+    if (buffer)
+        type = buffer->params.type;
+    if (type < FZ_IMAGE_BMP || type == FZ_IMAGE_JBIG2)
+        type = FZ_IMAGE_UNKNOWN;
+    PyObject *bytes = NULL;
+    fz_var(bytes);
+    fz_try(ctx) {
+        if (buffer && type != FZ_IMAGE_UNKNOWN) {
+            buf = buffer->buffer;
+            ext = JM_image_extension(type);
+        }
+        else {
+            buf = freebuf = fz_new_buffer_from_image_as_png(ctx, image, fz_default_color_params);
+            ext = "png";
+        }
+        if (PY_MAJOR_VERSION > 2) {
+            bytes = JM_BinFromBuffer(ctx, buf);
+        }
+        else {
+            bytes = JM_BArrayFromBuffer(ctx, buf);
+        }
+    }
+    fz_always(ctx) {
+        if (!bytes)
+            bytes = JM_BinFromChar("");
+        DICT_SETITEM_DROP(block_dict, dictkey_width,
+                          Py_BuildValue("i", w));
+        DICT_SETITEM_DROP(block_dict, dictkey_height,
+                          Py_BuildValue("i", h));
+        DICT_SETITEM_DROP(block_dict, dictkey_ext,
+                          Py_BuildValue("s", ext));
+        DICT_SETITEM_DROP(block_dict, dictkey_colorspace,
+                          Py_BuildValue("i", n));
+        DICT_SETITEM_DROP(block_dict, dictkey_xres,
+                          Py_BuildValue("i", image->xres));
+        DICT_SETITEM_DROP(block_dict, dictkey_yres,
+                          Py_BuildValue("i", image->xres));
+        DICT_SETITEM_DROP(block_dict, dictkey_bpc,
+                          Py_BuildValue("i", (int) image->bpc));
+        DICT_SETITEM_DROP(block_dict, dictkey_image, bytes);
+
+        fz_drop_buffer(ctx, freebuf);
+    }
+    fz_catch(ctx) {;}
+    return;
+}
+
+static void JM_make_text_block(fz_context *ctx, fz_stext_block *block, PyObject *block_dict, int raw, fz_buffer *buff)
+{
+    fz_stext_line *line;
+    PyObject *line_list = PyList_New(0), *line_dict;
+
+    for (line = block->u.t.first_line; line; line = line->next)
+    {
+        line_dict = PyDict_New();
+
+        DICT_SETITEM_DROP(line_dict, dictkey_wmode,
+                      Py_BuildValue("i", line->wmode));
+        DICT_SETITEM_DROP(line_dict, dictkey_dir,
+                      Py_BuildValue("ff", line->dir.x, line->dir.y));
+        DICT_SETITEM_DROP(line_dict, dictkey_bbox,
+                      JM_py_from_rect(line->bbox));
+        DICT_SETITEM_DROP(line_dict, dictkey_spans,
+                       JM_make_spanlist(ctx, line, raw, buff));
+
+        LIST_APPEND_DROP(line_list, line_dict);
+    }
+    DICT_SETITEM_DROP(block_dict, dictkey_lines, line_list);
+    return;
+}
+
+void JM_make_textpage_dict(fz_context *ctx, fz_stext_page *tp, PyObject *page_dict, int raw)
+{
+    fz_stext_block *block;
+    fz_buffer *text_buffer = fz_new_buffer(ctx, 64);
+    PyObject *block_dict, *block_list = PyList_New(0);
+    for (block = tp->first_block; block; block = block->next)
+    {
+        block_dict = PyDict_New();
+
+        DICT_SETITEM_DROP(block_dict, dictkey_type, Py_BuildValue("i", block->type));
+        DICT_SETITEM_DROP(block_dict, dictkey_bbox, JM_py_from_rect(block->bbox));
+
+        if (block->type == FZ_STEXT_BLOCK_IMAGE)
+        {
+            JM_make_image_block(ctx, block, block_dict);
+        }
+        else
+        {
+            JM_make_text_block(ctx, block, block_dict, raw, text_buffer);
+        }
+
+        LIST_APPEND_DROP(block_list, block_dict);
+    }
+    DICT_SETITEM_DROP(page_dict, dictkey_blocks, block_list);
+    fz_drop_buffer(ctx, text_buffer);
+}
+
+
+fz_buffer *JM_object_to_buffer(fz_context *ctx, pdf_obj *what, int compress, int ascii)
+{
+    fz_buffer *res=NULL;
+    fz_output *out=NULL;
+    fz_try(ctx) {
+        res = fz_new_buffer(ctx, 1024);
+        out = fz_new_output_with_buffer(ctx, res);
+        pdf_print_obj(ctx, out, what, compress, ascii);
+    }
+    fz_always(ctx) {
+        fz_drop_output(ctx, out);
+    }
+    fz_catch(ctx) {
+        return NULL;
+    }
+    fz_terminate_buffer(gctx, res);
+    return res;
+}
+
+//-----------------------------------------------------------------------------
+// Merge the /Resources object created by a text pdf device into the page.
+// The device may have created multiple /ExtGState/Alp? and /Font/F? objects.
+// These need to be renamed (renumbered) to not overwrite existing page
+// objects from previous executions.
+// Returns the next available numbers n, m for objects /Alp<n>, /F<m>.
+//-----------------------------------------------------------------------------
+PyObject *JM_merge_resources(fz_context *ctx, pdf_page *page, pdf_obj *temp_res)
+{
+    // page objects /Resources, /Resources/ExtGState, /Resources/Font
+    pdf_obj *resources = pdf_dict_get(ctx, page->obj, PDF_NAME(Resources));
+    pdf_obj *main_extg = pdf_dict_get(ctx, resources, PDF_NAME(ExtGState));
+    pdf_obj *main_fonts = pdf_dict_get(ctx, resources, PDF_NAME(Font));
+
+    // text pdf device objects /ExtGState, /Font
+    pdf_obj *temp_extg = pdf_dict_get(ctx, temp_res, PDF_NAME(ExtGState));
+    pdf_obj *temp_fonts = pdf_dict_get(ctx, temp_res, PDF_NAME(Font));
+
+    int max_alp = 0, max_fonts = 0, i, n;
+    char start_str[32] = {0};  // string for comparison
+    char text[32] = {0};  // string for comparison
+
+    // Handle /Alp objects
+    if (pdf_is_dict(ctx, temp_extg))  // any created at all?
+    {
+        n = pdf_dict_len(ctx, temp_extg);
+        if (pdf_is_dict(ctx, main_extg))  // does page have /ExtGState yet?
+        {
+            for (i = 0; i < pdf_dict_len(ctx, main_extg); i++)
+            {   // get highest number of objects named /Alp?
+                char *alp = (char *) pdf_to_name(ctx, pdf_dict_get_key(ctx, main_extg, i));
+                if (strncmp(alp, "Alp", 3) != 0) continue;
+                if (strcmp(start_str, alp) < 0) strcpy(start_str, alp);
+            }
+            while (strcmp(text, start_str) < 0)
+            {   // compute next available number
+                fz_snprintf(text, sizeof(text), "Alp%d", max_alp);
+                max_alp++;
+            }
+        }
+        else  // create a /ExtGState for the page
+            main_extg = pdf_dict_put_dict(ctx, resources, PDF_NAME(ExtGState), n);
+
+        for (i = 0; i < n; i++)  // copy over renumbered /Alp objects
+        {
+            fz_snprintf(text, sizeof(text), "Alp%d", i + max_alp);  // new name
+            pdf_obj *val = pdf_dict_get_val(ctx, temp_extg, i);
+            pdf_dict_puts(ctx, main_extg, text, val);
+        }
+    }
+
+    text[0] = 0;  // empty comparison string
+    start_str[0] = 0;  // empty comparison string
+
+    if (pdf_is_dict(ctx, main_fonts))  // has page any fonts yet?
+    {
+        for (i = 0; i < pdf_dict_len(ctx, main_fonts); i++)
+        {   // get highest number of fonts named /Fxxx
+            char *font = (char *) pdf_to_name(ctx, pdf_dict_get_key(ctx, main_fonts, i));
+            if (strncmp(font, "F", 1) != 0) continue;
+            if (strcmp(start_str, font) < 0 || strlen(start_str) < strlen(font))
+                strcpy(start_str, font);
+        }
+        while (strcmp(text, start_str) < 0)
+        {   // compute next available number
+            fz_snprintf(text, sizeof(text), "F%d", max_fonts);
+            max_fonts++;
+        }
+    }
+    else  // create a Resources/Font for the page
+        main_fonts = pdf_dict_put_dict(ctx, resources, PDF_NAME(Font), 2);
+
+    for (i = 0; i < pdf_dict_len(ctx, temp_fonts); i++)
+    {   // copy over renumbered font objects
+        fz_snprintf(text, sizeof(text), "F%d", i + max_fonts);
+        pdf_obj *val = pdf_dict_get_val(ctx, temp_fonts, i);
+        pdf_dict_puts(ctx, main_fonts, text, val);
+    }
+    return Py_BuildValue("ii", max_alp, max_fonts); // next available numbers
+}
+
+
+//-----------------------------------------------------------------------------
+// version of fz_show_string, which also covers UCDN script
+//-----------------------------------------------------------------------------
+fz_matrix JM_show_string(fz_context *ctx, fz_text *text, fz_font *user_font, fz_matrix trm, const char *s, int wmode, int bidi_level, fz_bidi_direction markup_dir, fz_text_language language, int script)
+{
+    fz_font *font;
+    int gid, ucs;
+    float adv;
+
+    while (*s)
+    {
+        s += fz_chartorune(&ucs, s);
+        gid = fz_encode_character_with_fallback(ctx, user_font, ucs, script, language, &font);
+        fz_show_glyph(ctx, text, font, trm, gid, ucs, wmode, bidi_level, markup_dir, language);
+        adv = fz_advance_glyph(ctx, font, gid, wmode);
+        if (wmode == 0)
+            trm = fz_pre_translate(trm, adv, 0);
+        else
+            trm = fz_pre_translate(trm, 0, -adv);
+    }
+
+    return trm;
+}
+
+
+//-----------------------------------------------------------------------------
+// return a fz_font from a number of parameters
+//-----------------------------------------------------------------------------
+fz_font *JM_get_font(fz_context *ctx,
+    char *fontname,
+    char *fontfile,
+    PyObject *fontbuffer,
+    int script,
+    int lang,
+    int ordering,
+    int is_bold,
+    int is_italic,
+    int is_serif)
+{
+    const unsigned char *data = NULL;
+    int size, index=0;
+    fz_buffer *res = NULL;
+    fz_font *font = NULL;
+    fz_try(ctx) {
+        if (fontfile) goto have_file;
+        if (EXISTS(fontbuffer)) goto have_buffer;
+        if (ordering > -1) goto have_cjk;
+        if (fontname) goto have_base14;
+        goto have_noto;
+
+        // Base-14 font
+        have_base14:;
+        data = fz_lookup_base14_font(ctx, fontname, &size);
+        if (data) font = fz_new_font_from_memory(ctx, fontname, data, size, 0, 0);
+        if(font) goto fertig;
+
+        data = fz_lookup_builtin_font(gctx, fontname, is_bold, is_italic, &size);
+        if (data) font = fz_new_font_from_memory(ctx, fontname, data, size, 0, 0);
+        goto fertig;
+
+        // CJK font
+        have_cjk:;
+        data = fz_lookup_cjk_font(ctx, ordering, &size, &index);
+        if (data) font = fz_new_font_from_memory(ctx, NULL, data, size, index, 0);
+        goto fertig;
+
+        // fontfile
+        have_file:;
+        font = fz_new_font_from_file(ctx, NULL, fontfile, index, 0);
+        goto fertig;
+
+        // fontbuffer
+        have_buffer:;
+        res = JM_BufferFromBytes(ctx, fontbuffer);
+        font = fz_new_font_from_buffer(ctx, NULL, res, index, 0);
+        goto fertig;
+
+        // Check for NOTO font
+        have_noto:;
+        data = fz_lookup_noto_font(ctx, script, lang, &size, &index);
+        if (data) font = fz_new_font_from_memory(ctx, NULL, data, size, index, 0);
+        if (font) goto fertig;
+        font = fz_load_fallback_font(ctx, script, lang, is_serif, is_bold, is_italic);
+        goto fertig;
+
+        fertig:;
+        if (!font) THROWMSG("could not find a matching font");
+    }
+    fz_always(ctx) {
+        fz_drop_buffer(ctx, res);
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return font;
+}
+
+%}
diff --git a/fitz/helper-xobject.i b/fitz/helper-xobject.i

new file mode 100644 (file)

index 0000000..902d1de
--- /dev/null
+++ b/fitz/helper-xobject.i
@@ -0,0 +1,217 @@
+%{
+//-----------------------------------------------------------------------------
+// Read and concatenate a PDF page's /Conents object(s) in a buffer
+//-----------------------------------------------------------------------------
+fz_buffer *JM_read_contents(fz_context * ctx, pdf_obj * pageref)
+{
+    fz_buffer *res = NULL, *nres = NULL;
+    int i;
+    fz_try(ctx) {
+        pdf_obj *contents = pdf_dict_get(ctx, pageref, PDF_NAME(Contents));
+        if (pdf_is_array(ctx, contents)) {
+            res = fz_new_buffer(ctx, 1024);
+            for (i = 0; i < pdf_array_len(ctx, contents); i++) {
+                nres = pdf_load_stream(ctx, pdf_array_get(ctx, contents, i));
+                fz_append_buffer(ctx, res, nres);
+                fz_drop_buffer(ctx, nres);
+            }
+        }
+        else if (contents) {
+            res = pdf_load_stream(ctx, contents);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return res;
+}
+
+//-----------------------------------------------------------------------------
+// Make an XObject from a PDF page
+// For a positive xref assume that that object can be used instead
+//-----------------------------------------------------------------------------
+pdf_obj *JM_xobject_from_page(fz_context * ctx, pdf_document * pdfout, fz_page * fsrcpage, int xref, pdf_graft_map *gmap)
+{
+    fz_buffer *res = NULL;
+    pdf_obj *xobj1, *resources = NULL, *o, *spageref;
+    fz_rect mediabox;
+
+    fz_try(ctx) {
+        pdf_page *srcpage = pdf_page_from_fz_page(ctx, fsrcpage);
+        spageref = srcpage->obj;
+        mediabox = pdf_to_rect(ctx, pdf_dict_get_inheritable(ctx, spageref, PDF_NAME(MediaBox)));
+
+        if (xref > 0) {
+            xobj1 = pdf_new_indirect(ctx, pdfout, xref, 0);
+        }
+        else {
+            // Deep-copy resources object of source page
+            o = pdf_dict_get_inheritable(ctx, spageref, PDF_NAME(Resources));
+            if (gmap) // use graftmap when possible
+                resources = pdf_graft_mapped_object(ctx, gmap, o);
+            else
+                resources = pdf_graft_object(ctx, pdfout, o);
+
+            // get spgage contents source
+            res = JM_read_contents(ctx, spageref);
+
+            //-------------------------------------------------------------
+            // create XObject representing the source page
+            //-------------------------------------------------------------
+            xobj1 = pdf_new_xobject(ctx, pdfout, mediabox, fz_identity, NULL, res);
+            // store spage contents
+            JM_update_stream(ctx, pdfout, xobj1, res, 1);
+            fz_drop_buffer(ctx, res);
+
+            // store spage resources
+            pdf_dict_put_drop(ctx, xobj1, PDF_NAME(Resources), resources);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return xobj1;
+}
+
+//-----------------------------------------------------------------------------
+// Insert a buffer as a new separate /Contents object of a page.
+// 1. Create a new stream object from buffer 'newcont'
+// 2. If /Contents already is an array, then just prepend or append this object
+// 3. Else, create new array and put old content obj and this object into it.
+//    If the page had no /Contents before, just create a 1-item array.
+//-----------------------------------------------------------------------------
+int JM_insert_contents(fz_context * ctx, pdf_document * pdf,
+                        pdf_obj * pageref, fz_buffer * newcont, int overlay)
+{
+    int xref = 0;
+    fz_try(ctx) {
+        pdf_obj *contents = pdf_dict_get(ctx, pageref, PDF_NAME(Contents));
+        pdf_obj *newconts = pdf_add_stream(ctx, pdf, newcont, NULL, 0);
+        xref = pdf_to_num(ctx, newconts);
+        if (pdf_is_array(ctx, contents)) {
+            if (overlay) // append new object
+                pdf_array_push(ctx, contents, newconts);
+            else // prepend new object
+                pdf_array_insert(ctx, contents, newconts, 0);
+        }
+        else {
+            pdf_obj *carr = pdf_new_array(ctx, pdf, 5);
+            if (overlay) {
+                if (contents)
+                    pdf_array_push(ctx, carr, contents);
+                pdf_array_push(ctx, carr, newconts);
+            }
+            else {
+                pdf_array_push_drop(ctx, carr, newconts);
+                if (contents)
+                    pdf_array_push(ctx, carr, contents);
+            }
+            pdf_dict_put(ctx, pageref, PDF_NAME(Contents), carr);
+        }
+    }
+    fz_catch(ctx) {
+        fz_rethrow(ctx);
+    }
+    return xref;
+}
+
+static PyObject *img_info = NULL;
+
+static fz_image *
+JM_image_filter(fz_context * ctx, void *opaque, fz_matrix ctm, const char *name, fz_image *image)
+{
+    fz_quad q = fz_transform_quad(fz_quad_from_rect(fz_unit_rect), ctm);
+    PyObject *q_py = JM_py_from_quad(q);
+    PyList_Append(img_info, Py_BuildValue("sO", name, q_py));
+    Py_DECREF(q_py);
+    return NULL;
+}
+
+void
+JM_filter_content_stream(
+    fz_context * ctx,
+    pdf_document * doc,
+    pdf_obj * in_stm,
+    pdf_obj * in_res,
+    fz_matrix transform,
+    pdf_filter_options * filter,
+    int struct_parents,
+    fz_buffer **out_buf,
+    pdf_obj **out_res)
+{
+    pdf_processor *proc_buffer = NULL;
+    pdf_processor *proc_filter = NULL;
+
+    fz_var(proc_buffer);
+    fz_var(proc_filter);
+
+    *out_buf = NULL;
+    *out_res = NULL;
+
+    fz_try(ctx) {
+               *out_buf = fz_new_buffer(ctx, 1024);
+               proc_buffer = pdf_new_buffer_processor(ctx, *out_buf, filter->ascii);
+               if (filter->sanitize) {
+                       *out_res = pdf_new_dict(ctx, doc, 1);
+                       proc_filter = pdf_new_filter_processor(ctx, doc, proc_buffer, in_res, *out_res, struct_parents, transform, filter);
+                       pdf_process_contents(ctx, proc_filter, doc, in_res, in_stm, NULL);
+                       pdf_close_processor(ctx, proc_filter);
+               }
+               else {
+                       *out_res = pdf_keep_obj(ctx, in_res);
+                       pdf_process_contents(ctx, proc_buffer, doc, in_res, in_stm, NULL);
+               }
+               pdf_close_processor(ctx, proc_buffer);
+    }
+    fz_always(ctx) {
+        pdf_drop_processor(ctx, proc_filter);
+        pdf_drop_processor(ctx, proc_buffer);
+    }
+    fz_catch(ctx) {
+        fz_drop_buffer(ctx, *out_buf);
+        *out_buf = NULL;
+        pdf_drop_obj(ctx, *out_res);
+        *out_res = NULL;
+        fz_rethrow(ctx);
+    }
+}
+
+PyObject *
+JM_image_reporter(fz_context *ctx, pdf_page *page)
+{
+    pdf_document *doc = page->doc;
+    pdf_filter_options filter;
+    memset(&filter, 0, sizeof filter);
+    filter.opaque = page;
+    filter.text_filter = NULL;
+    filter.image_filter = JM_image_filter;
+    filter.end_page = NULL;
+    filter.recurse = 0;
+    filter.instance_forms = 1;
+    filter.sanitize = 1;
+    filter.ascii = 1;
+
+    pdf_obj *contents, *old_res;
+    pdf_obj *struct_parents_obj;
+    pdf_obj *new_res;
+    fz_buffer *buffer;
+    int struct_parents;
+
+    struct_parents_obj = pdf_dict_get(ctx, page->obj, PDF_NAME(StructParents));
+    struct_parents = -1;
+    if (pdf_is_number(ctx, struct_parents_obj))
+        struct_parents = pdf_to_int(ctx, struct_parents_obj);
+
+    contents = pdf_page_contents(ctx, page);
+    old_res = pdf_page_resources(ctx, page);
+    img_info = PyList_New(0);
+    JM_filter_content_stream(ctx, doc, contents, old_res, fz_identity, &filter, struct_parents, &buffer, &new_res);
+    fz_drop_buffer(ctx, buffer);
+    pdf_drop_obj(ctx, new_res);
+    PyObject *rc = PySequence_Tuple(img_info);
+    Py_DECREF(img_info);
+    img_info = NULL;
+    return rc;
+}
+
+%}
diff --git a/fitz/utils.py b/fitz/utils.py

new file mode 100644 (file)

index 0000000..8ec031f
--- /dev/null
+++ b/fitz/utils.py
@@ -0,0 +1,3601 @@
+from __future__ import division
+
+import io
+import math
+import os
+import warnings
+
+from fitz import *
+
+
+"""
+This is a collection of functions to extend PyMupdf.
+"""
+
+
+def writeText(
+    page,
+    rect=None,
+    writers=None,
+    opacity=None,
+    color=None,
+    overlay=True,
+    keep_proportion=True,
+    rotate=0,
+):
+    """Write the text of one or TextWriter objects.
+    
+    Args:
+        rect: target rectangle. If None, the union of the text writers is used.
+        writers: one or more TextWriter objects.
+        overlay: put in foreground or background.
+        keep_proportion: maintain aspect ratio of rectangle sides.
+        rotate: arbitrary rotation angle.
+    """
+    if not writers:
+        raise ValueError("specify at least one TextWriter")
+    if type(writers) is TextWriter:
+        if rotate == 0 and rect is None:
+            writers.writeText(page, opacity=opacity, color=color, overlay=overlay)
+            return None
+        else:
+            writers = (writers,)
+    clip = writers[0].textRect
+    textdoc = Document()
+    tpage = textdoc.newPage(width=page.rect.width, height=page.rect.height)
+    for writer in writers:
+        clip |= writer.textRect
+        writer.writeText(tpage, opacity=opacity, color=color)
+    if rect is None:
+        rect = clip
+    page.showPDFpage(
+        rect,
+        textdoc,
+        0,
+        overlay=overlay,
+        keep_proportion=keep_proportion,
+        rotate=rotate,
+        clip=clip,
+    )
+    textdoc = None
+    tpage = None
+
+
+def showPDFpage(
+    page,
+    rect,
+    src,
+    pno=0,
+    overlay=True,
+    keep_proportion=True,
+    rotate=0,
+    reuse_xref=0,
+    clip=None,
+):
+    """Show page number 'pno' of PDF 'src' in rectangle 'rect'.
+
+    Args:
+        rect: (rect-like) where to place the source image
+        src: (document) source PDF
+        pno: (int) source page number
+        overlay: (bool) put in foreground
+        keep_proportion: (bool) do not change width-height-ratio
+        rotate: (int) degrees (multiple of 90)
+        clip: (rect-like) part of source page rectangle
+    Returns:
+        xref of inserted object (for reuse)
+    """
+
+    def calc_matrix(sr, tr, keep=True, rotate=0):
+        """ Calculate transformation matrix from source to target rect.
+
+        Notes:
+            The product of four matrices in this sequence: (1) translate correct
+            source corner to origin, (2) rotate, (3) scale, (4) translate to
+            target's top-left corner.
+        Args:
+            sr: source rect in PDF (!) coordinate system
+            tr: target rect in PDF coordinate system
+            keep: whether to keep source ratio of width to height
+            rotate: rotation angle in degrees
+        Returns:
+            Transformation matrix.
+        """
+        # calc center point of source rect
+        smp = Point((sr.x1 + sr.x0) / 2.0, (sr.y1 + sr.y0) / 2.0)
+        # calc center point of target rect
+        tmp = Point((tr.x1 + tr.x0) / 2.0, (tr.y1 + tr.y0) / 2.0)
+
+        rot = Matrix(rotate)  # rotation matrix
+
+        # m moves to (0, 0), then rotates
+        m = Matrix(1, 0, 0, 1, -smp.x, -smp.y) * rot
+
+        sr1 = sr * m  # resulting source rect to calculate scale factors
+
+        fw = tr.width / sr1.width  # scale the width
+        fh = tr.height / sr1.height  # scale the height
+        if keep:
+            fw = fh = min(fw, fh)  # take min if keeping aspect ratio
+
+        m *= Matrix(fw, fh)  # concat scale matrix
+        m *= Matrix(1, 0, 0, 1, tmp.x, tmp.y)  # concat move to target center
+        return JM_TUPLE(m)
+
+    CheckParent(page)
+    doc = page.parent
+
+    if not doc.isPDF or not src.isPDF:
+        raise ValueError("not a PDF")
+
+    rect = page.rect & rect  # intersect with page rectangle
+    if rect.isEmpty or rect.isInfinite:
+        raise ValueError("rect must be finite and not empty")
+
+    if reuse_xref > 0:
+        warnings.warn("ignoring 'reuse_xref'", DeprecationWarning)
+
+    while pno < 0:  # support negative page numbers
+        pno += len(src)
+    src_page = src[pno]  # load ource page
+    if len(src_page._getContents()) == 0:
+        raise ValueError("nothing to show - source page empty")
+
+    tar_rect = rect * ~page.transformationMatrix  # target rect in PDF coordinates
+
+    src_rect = src_page.rect if not clip else src_page.rect & clip  # source rect
+    if src_rect.isEmpty or src_rect.isInfinite:
+        raise ValueError("clip must be finite and not empty")
+    src_rect = src_rect * ~src_page.transformationMatrix  # ... in PDF coord
+
+    matrix = calc_matrix(src_rect, tar_rect, keep=keep_proportion, rotate=rotate)
+
+    # list of existing /Form /XObjects
+    ilst = [i[1] for i in doc._getPageInfo(page.number, 3)]
+
+    # create a name not in that list
+    n = "fzFrm"
+    i = 0
+    _imgname = n + "0"
+    while _imgname in ilst:
+        i += 1
+        _imgname = n + str(i)
+
+    isrc = src._graft_id  # used as key for graftmaps
+    if doc._graft_id == isrc:
+        raise ValueError("source document must not equal target")
+
+    # check if we have already copied objects from this source doc
+    if isrc in doc.Graftmaps:  # yes: use the old graftmap
+        gmap = doc.Graftmaps[isrc]
+    else:  # no: make a new graftmap
+        gmap = Graftmap(doc)
+        doc.Graftmaps[isrc] = gmap
+
+    # take note of generated xref for automatic reuse
+    pno_id = (isrc, pno)  # id of src[pno]
+    xref = doc.ShownPages.get(pno_id, 0)
+
+    xref = page._showPDFpage(
+        src_page,
+        overlay=overlay,
+        matrix=matrix,
+        xref=xref,
+        clip=src_rect,
+        graftmap=gmap,
+        _imgname=_imgname,
+    )
+    doc.ShownPages[pno_id] = xref
+
+    return xref
+
+
+def insertImage(
+    page,
+    rect,
+    filename=None,
+    pixmap=None,
+    stream=None,
+    rotate=0,
+    keep_proportion=True,
+    overlay=True,
+):
+    """Insert an image in a rectangle on the current page.
+
+    Notes:
+        Exactly one of filename, pixmap or stream must be provided.
+    Args:
+        rect: (rect-like) where to place the source image
+        filename: (str) name of an image file
+        pixmap: (obj) a Pixmap object
+        stream: (bytes) an image in memory
+        rotate: (int) degrees (multiple of 90)
+        keep_proportion: (bool) whether to maintain aspect ratio
+        overlay: (bool) put in foreground
+    """
+
+    def calc_matrix(fw, fh, tr, rotate=0):
+        """ Calculate transformation matrix for image insertion.
+
+        Notes:
+            The image will preserve its aspect ratio if and only if arguments
+            fw, fh are both equal to 1.
+        Args:
+            fw, fh: width / height ratio factors of image - floats in (0,1].
+                At least one of them (corresponding to the longer side) is equal to 1.
+            tr: target rect in PDF coordinates
+            rotate: rotation angle in degrees
+        Returns:
+            Transformation matrix.
+        """
+        # center point of target rect
+        tmp = Point((tr.x1 + tr.x0) / 2.0, (tr.y1 + tr.y0) / 2.0)
+
+        rot = Matrix(rotate)  # rotation matrix
+
+        # matrix m moves image center to (0, 0), then rotates
+        m = Matrix(1, 0, 0, 1, -0.5, -0.5) * rot
+
+        # sr1 = sr * m  # resulting image rect
+
+        # --------------------------------------------------------------------
+        # calculate the scale matrix
+        # --------------------------------------------------------------------
+        small = min(fw, fh)  # factor of the smaller side
+
+        if rotate not in (0, 180):
+            fw, fh = fh, fw  # width / height exchange their roles
+
+        if fw < 1:  # portrait
+            if tr.width / fw > tr.height / fh:
+                w = tr.height * small
+                h = tr.height
+            else:
+                w = tr.width
+                h = tr.width / small
+
+        elif fw != fh:  # landscape
+            if tr.width / fw > tr.height / fh:
+                w = tr.height / small
+                h = tr.height
+            else:
+                w = tr.width
+                h = tr.width * small
+
+        else:  # (treated as) equal sided
+            w = tr.width
+            h = tr.height
+
+        m *= Matrix(w, h)  # concat scale matrix
+
+        m *= Matrix(1, 0, 0, 1, tmp.x, tmp.y)  # concat move to target center
+
+        return m
+
+    # -------------------------------------------------------------------------
+
+    CheckParent(page)
+    doc = page.parent
+    if not doc.isPDF:
+        raise ValueError("not a PDF")
+    if bool(filename) + bool(stream) + bool(pixmap) != 1:
+        raise ValueError("need exactly one of filename, pixmap, stream")
+
+    if filename and not os.path.exists(filename):
+        raise FileNotFoundError("No such file: '%s'" % filename)
+    elif stream and type(stream) not in (bytes, bytearray, io.BytesIO):
+        raise ValueError("stream must be bytes-like or BytesIO")
+    elif pixmap and type(pixmap) is not Pixmap:
+        raise ValueError("pixmap must be a Pixmap")
+
+    while rotate < 0:
+        rotate += 360
+    while rotate >= 360:
+        rotate -= 360
+    if rotate not in (0, 90, 180, 270):
+        raise ValueError("bad rotate value")
+
+    r = page.CropBox & rect
+    if r.isEmpty or r.isInfinite:
+        raise ValueError("rect must be finite and not empty")
+
+    _imgpointer = None
+
+    # -------------------------------------------------------------------------
+    # Calculate the matrix for image insertion.
+    # -------------------------------------------------------------------------
+    # If aspect ratio must be kept, we need to know image width and height.
+    # Easy for pixmaps. For file and stream cases, we make an fz_image and
+    # take those values from it. In this case, we also hand the fz_image over
+    # to the actual C-level function (_imgpointer), and set all other
+    # parameters to None.
+    # -------------------------------------------------------------------------
+    if keep_proportion is True:  # for this we need the image dimension
+        if pixmap:  # this is the easy case
+            w = pixmap.width
+            h = pixmap.height
+
+        elif stream:  # use tool to access the information
+            # we also pass through the generated fz_image address
+            if type(stream) is io.BytesIO:
+                stream = stream.getvalue()
+            img_prof = TOOLS.image_profile(stream, keep_image=True)
+            w, h = img_prof["width"], img_prof["height"]
+            stream = None  # make sure this arg is NOT used
+            _imgpointer = img_prof["image"]  # pointer to fz_image
+
+        else:  # worst case: must read the file
+            stream = open(filename, "rb").read()
+            img_prof = TOOLS.image_profile(stream, keep_image=True)
+            w, h = img_prof["width"], img_prof["height"]
+            stream = None  # make sure this arg is NOT used
+            filename = None  # make sure this arg is NOT used
+            _imgpointer = img_prof["image"]  # pointer to fz_image
+
+        maxf = max(w, h)
+        fw = w / maxf
+        fh = h / maxf
+    else:
+        fw = fh = 1.0
+
+    clip = r * ~page.transformationMatrix  # target rect in PDF coordinates
+
+    matrix = calc_matrix(fw, fh, clip, rotate=rotate)  # calculate matrix
+
+    # Create a unique image reference name. First make existing names list.
+    ilst = [i[7] for i in doc.getPageImageList(page.number)]  # existing names
+    n = "fzImg"  # 'fitz image'
+    i = 0
+    _imgname = n + "0"  # first name candidate
+    while _imgname in ilst:
+        i += 1
+        _imgname = n + str(i)  # try new name
+
+    page._insertImage(
+        filename=filename,  # image in file
+        pixmap=pixmap,  # image in pixmap
+        stream=stream,  # image in memory
+        matrix=matrix,  # generated matrix
+        overlay=overlay,
+        _imgname=_imgname,  # generated PDF resource name
+        _imgpointer=_imgpointer,  # address of fz_image
+    )
+
+
+def searchFor(page, text, hit_max=16, quads=False, flags=None):
+    """ Search for a string on a page.
+
+    Args:
+        text: string to be searched for
+        hit_max: maximum hits
+        quads: return quads instead of rectangles
+    Returns:
+        a list of rectangles or quads, each containing one occurrence.
+    """
+    CheckParent(page)
+    if flags is None:
+        flags = TEXT_PRESERVE_LIGATURES | TEXT_PRESERVE_WHITESPACE
+    tp = page.getTextPage(flags)  # create TextPage
+    rlist = tp.search(text, hit_max=hit_max, quads=quads)
+    tp = None
+    return rlist
+
+
+def searchPageFor(doc, pno, text, hit_max=16, quads=False, flags=None):
+    """ Search for a string on a page.
+
+    Args:
+        pno: page number
+        text: string to be searched for
+        hit_max: maximum hits
+        quads: return quads instead of rectangles
+    Returns:
+        a list of rectangles or quads, each containing an occurrence.
+    """
+
+    return doc[pno].searchFor(text, hit_max=hit_max, quads=quads, flags=flags)
+
+
+def getTextBlocks(page, flags=None):
+    """Return the text blocks on a page.
+
+    Notes:
+        Lines in a block are concatenated with line breaks.
+    Args:
+        flags: (int) control the amount of data parsed into the textpage.
+    Returns:
+        A list of the blocks. Each item contains the containing rectangle
+        coordinates, text lines, block type and running block number.
+    """
+    CheckParent(page)
+    if flags is None:
+        flags = TEXT_PRESERVE_LIGATURES | TEXT_PRESERVE_WHITESPACE
+    tp = page.getTextPage(flags)
+    l = []
+    tp.extractBLOCKS(l)
+    del tp
+    return l
+
+
+def getTextWords(page, flags=None):
+    """Return the text words as a list with the bbox for each word.
+
+    Args:
+        flags: (int) control the amount of data parsed into the textpage.
+    """
+    CheckParent(page)
+    if flags is None:
+        flags = TEXT_PRESERVE_LIGATURES | TEXT_PRESERVE_WHITESPACE
+    tp = page.getTextPage(flags)
+    l = []
+    tp.extractWORDS(l)
+    del tp
+    return l
+
+
+def getText(page, option="text", flags=None):
+    """ Extract a document page's text.
+
+    This is a unifying wrapper for various methods of Page / TextPage classes.
+
+    Args:
+        option: (str) text, words, blocks, html, dict, json, rawdict, xhtml or xml.
+
+    Returns:
+        the output of Page methods getTextWords / getTextBlocks or TextPage
+        methods extractText, extractHTML, extractDICT, extractJSON, extractRAWDICT,
+        extractXHTML or etractXML respectively.
+        Default and misspelling choice is "text".
+    """
+    option = option.lower()
+    if option == "words":
+        return getTextWords(page, flags=flags)
+    if option == "blocks":
+        return getTextBlocks(page, flags=flags)
+    CheckParent(page)
+    # available output types
+    formats = ("text", "html", "json", "xml", "xhtml", "dict", "rawdict")
+    if option not in formats:
+        option = "text"
+    # choose which of them also include images in the TextPage
+    images = (0, 1, 1, 0, 1, 1, 1)  # controls image inclusion in text page
+    f = formats.index(option)
+    if flags is None:
+        flags = TEXT_PRESERVE_LIGATURES | TEXT_PRESERVE_WHITESPACE
+        if images[f] == 1:
+            flags |= TEXT_PRESERVE_IMAGES
+
+    tp = page.getTextPage(flags)  # TextPage with or without images
+
+    if f == 2:
+        t = tp.extractJSON()
+    elif f == 5:
+        t = tp.extractDICT()
+    elif f == 6:
+        t = tp.extractRAWDICT()
+    else:
+        t = tp._extractText(f)
+
+    del tp
+    return t
+
+
+def getPageText(doc, pno, option="text", flags=None):
+    """ Extract a document page's text by page number.
+
+    Notes:
+        Convenience function calling page.getText().
+    Args:
+        pno: page number
+        option: (str) text, words, blocks, html, dict, json, rawdict, xhtml or xml.
+    Returns:
+        output from page.TextPage().
+    """
+    return doc[pno].getText(option, flags=flags)
+
+
+def getPixmap(page, matrix=None, colorspace=csRGB, clip=None, alpha=False, annots=True):
+    """Create pixmap of page.
+
+    Args:
+        matrix: Matrix for transformation (default: Identity).
+        colorspace: (str/Colorspace) cmyk, rgb, gray - case ignored, default csRGB.
+        clip: (irect-like) restrict rendering to this area.
+        alpha: (bool) whether to include alpha channel
+        annots: (bool) whether to also render annotations
+    """
+    CheckParent(page)
+    doc = page.parent
+    if type(colorspace) is str:
+        if colorspace.upper() == "GRAY":
+            colorspace = csGRAY
+        elif colorspace.upper() == "CMYK":
+            colorspace = csCMYK
+        else:
+            colorspace = csRGB
+    if colorspace.n not in (1, 3, 4):
+        raise ValueError("unsupported colorspace")
+
+    return page._makePixmap(doc, matrix, colorspace, alpha, annots, clip)
+
+
+def getPagePixmap(
+    doc, pno, matrix=None, colorspace=csRGB, clip=None, alpha=False, annots=True
+):
+    """Create pixmap of document page by page number.
+
+    Notes:
+        Convenience function calling page.getPixmap.
+    Args:
+        pno: (int) page number
+        matrix: Matrix for transformation (default: Identity).
+        colorspace: (str,Colorspace) rgb, rgb, gray - case ignored, default csRGB.
+        clip: (irect-like) restrict rendering to this area.
+        alpha: (bool) include alpha channel
+        annots: (bool) also render annotations
+    """
+    return doc[pno].getPixmap(
+        matrix=matrix, colorspace=colorspace, clip=clip, alpha=alpha, annots=annots
+    )
+
+
+def getLinkDict(ln):
+    nl = {"kind": ln.dest.kind, "xref": 0}
+    try:
+        nl["from"] = ln.rect
+    except:
+        pass
+    pnt = Point(0, 0)
+    if ln.dest.flags & LINK_FLAG_L_VALID:
+        pnt.x = ln.dest.lt.x
+    if ln.dest.flags & LINK_FLAG_T_VALID:
+        pnt.y = ln.dest.lt.y
+
+    if ln.dest.kind == LINK_URI:
+        nl["uri"] = ln.dest.uri
+
+    elif ln.dest.kind == LINK_GOTO:
+        nl["page"] = ln.dest.page
+        nl["to"] = pnt
+        if ln.dest.flags & LINK_FLAG_R_IS_ZOOM:
+            nl["zoom"] = ln.dest.rb.x
+        else:
+            nl["zoom"] = 0.0
+
+    elif ln.dest.kind == LINK_GOTOR:
+        nl["file"] = ln.dest.fileSpec.replace("\\", "/")
+        nl["page"] = ln.dest.page
+        if ln.dest.page < 0:
+            nl["to"] = ln.dest.dest
+        else:
+            nl["to"] = pnt
+            if ln.dest.flags & LINK_FLAG_R_IS_ZOOM:
+                nl["zoom"] = ln.dest.rb.x
+            else:
+                nl["zoom"] = 0.0
+
+    elif ln.dest.kind == LINK_LAUNCH:
+        nl["file"] = ln.dest.fileSpec.replace("\\", "/")
+
+    elif ln.dest.kind == LINK_NAMED:
+        nl["name"] = ln.dest.named
+
+    else:
+        nl["page"] = ln.dest.page
+
+    return nl
+
+
+def getLinks(page):
+    """Create a list of all links contained in a PDF page.
+
+    Notes:
+        see PyMuPDF ducmentation for details.
+    """
+
+    CheckParent(page)
+    ln = page.firstLink
+    links = []
+    while ln:
+        nl = getLinkDict(ln)
+        # if nl["kind"] == LINK_GOTO:
+        #    if type(nl["to"]) is Point and nl["page"] >= 0:
+        #        doc = page.parent
+        #        target_page = doc[nl["page"]]
+        #        ctm = target_page.transformationMatrix
+        #        point = nl["to"] * ctm
+        #        nl["to"] = point
+        links.append(nl)
+        ln = ln.next
+    if len(links) > 0:
+        linkxrefs = page._getLinkXrefs()
+        if len(linkxrefs) == len(links):
+            for i in range(len(linkxrefs)):
+                links[i]["xref"] = linkxrefs[i]
+    return links
+
+
+def getToC(doc, simple=True):
+    """Create a table of contents.
+
+    Args:
+        simple: a bool to control output. Returns a list, where each entry consists of outline level, title, page number and link destination (if simple = False). For details see PyMuPDF's documentation.
+    """
+
+    def recurse(olItem, liste, lvl):
+        """Recursively follow the outline item chain and record item information in a list."""
+        while olItem:
+            if olItem.title:
+                title = olItem.title
+            else:
+                title = " "
+
+            if not olItem.isExternal:
+                if olItem.uri:
+                    if olItem.page == -1:
+                        resolve = doc.resolveLink(olItem.uri)
+                        page = resolve[0] + 1
+                    else:
+                        page = olItem.page + 1
+                else:
+                    page = -1
+            else:
+                page = -1
+
+            if not simple:
+                link = getLinkDict(olItem)
+                liste.append([lvl, title, page, link])
+            else:
+                liste.append([lvl, title, page])
+
+            if olItem.down:
+                liste = recurse(olItem.down, liste, lvl + 1)
+            olItem = olItem.next
+        return liste
+
+    # check if document is open and not encrypted
+    if doc.isClosed:
+        raise ValueError("document closed")
+    doc.initData()
+    olItem = doc.outline
+
+    if not olItem:
+        return []
+    lvl = 1
+    liste = []
+    return recurse(olItem, liste, lvl)
+
+
+def getRectArea(*args):
+    """Calculate area of rectangle.\nparameter is one of 'px' (default), 'in', 'cm', or 'mm'."""
+    rect = args[0]
+    if len(args) > 1:
+        unit = args[1]
+    else:
+        unit = "px"
+    u = {"px": (1, 1), "in": (1.0, 72.0), "cm": (2.54, 72.0), "mm": (25.4, 72.0)}
+    f = (u[unit][0] / u[unit][1]) ** 2
+    return f * rect.width * rect.height
+
+
+def setMetadata(doc, m):
+    """Set a PDF's metadata (/Info dictionary)\nm: dictionary like doc.metadata'."""
+    if doc.isClosed or doc.isEncrypted:
+        raise ValueError("document closed or encrypted")
+    if type(m) is not dict:
+        raise ValueError("arg2 must be a dictionary")
+    for k in m.keys():
+        if not k in (
+            "author",
+            "producer",
+            "creator",
+            "title",
+            "format",
+            "encryption",
+            "creationDate",
+            "modDate",
+            "subject",
+            "keywords",
+        ):
+            raise ValueError("invalid dictionary key: " + k)
+    d = "<</Author"
+    d += getPDFstr(m.get("author", "none"))
+    d += "/CreationDate"
+    d += getPDFstr(m.get("creationDate", "none"))
+    d += "/Creator"
+    d += getPDFstr(m.get("creator", "none"))
+    d += "/Keywords"
+    d += getPDFstr(m.get("keywords", "none"))
+    d += "/ModDate"
+    d += getPDFstr(m.get("modDate", "none"))
+    d += "/Producer"
+    d += getPDFstr(m.get("producer", "none"))
+    d += "/Subject"
+    d += getPDFstr(m.get("subject", "none"))
+    d += "/Title"
+    d += getPDFstr(m.get("title", "none"))
+    d += ">>"
+    doc._setMetadata(d)
+    doc.initData()
+    return
+
+
+def getDestStr(xref, ddict):
+    """ Calculate the PDF action string.
+
+    Notes:
+        Supports Link annotations and outline items (bookmarks).
+    """
+    if not ddict:
+        return ""
+    str_goto = "/A<</S/GoTo/D[%i 0 R/XYZ %g %g %i]>>"
+    str_gotor1 = "/A<</S/GoToR/D[%s /XYZ %s %s %s]/F<</F%s/UF%s/Type/Filespec>>>>"
+    str_gotor2 = "/A<</S/GoToR/D%s/F<</F%s/UF%s/Type/Filespec>>>>"
+    str_launch = "/A<</S/Launch/F<</F%s/UF%s/Type/Filespec>>>>"
+    str_uri = "/A<</S/URI/URI%s>>"
+
+    if type(ddict) in (int, float):
+        dest = str_goto % (xref, 0, ddict, 0)
+        return dest
+    d_kind = ddict.get("kind", LINK_NONE)
+
+    if d_kind == LINK_NONE:
+        return ""
+
+    if ddict["kind"] == LINK_GOTO:
+        d_zoom = ddict.get("zoom", 0)
+        to = ddict.get("to", Point(0, 0))
+        d_left, d_top = to
+        dest = str_goto % (xref, d_left, d_top, d_zoom)
+        return dest
+
+    if ddict["kind"] == LINK_URI:
+        dest = str_uri % (getPDFstr(ddict["uri"]),)
+        return dest
+
+    if ddict["kind"] == LINK_LAUNCH:
+        fspec = getPDFstr(ddict["file"])
+        dest = str_launch % (fspec, fspec)
+        return dest
+
+    if ddict["kind"] == LINK_GOTOR and ddict["page"] < 0:
+        fspec = getPDFstr(ddict["file"])
+        dest = str_gotor2 % (getPDFstr(ddict["to"]), fspec, fspec)
+        return dest
+
+    if ddict["kind"] == LINK_GOTOR and ddict["page"] >= 0:
+        fspec = getPDFstr(ddict["file"])
+        dest = str_gotor1 % (
+            ddict["page"],
+            ddict["to"].x,
+            ddict["to"].y,
+            ddict["zoom"],
+            fspec,
+            fspec,
+        )
+        return dest
+
+    return ""
+
+
+def setToC(doc, toc, collapse=1):
+    """Create new outline tree (table of contents, TOC).
+
+    Args:
+        toc: (list, tuple) each entry must contain level, title, page and
+            optionally top margin on the page. None or '()' remove the TOC.
+        collapse: (int) collapses entries beyond this level. Zero or None
+            shows all entries unfolded.
+    Returns:
+        the number of inserted items, or the number of removed items respectively.
+    """
+    if doc.isClosed or doc.isEncrypted:
+        raise ValueError("document closed or encrypted")
+    if not doc.isPDF:
+        raise ValueError("not a PDF")
+    if not toc:  # remove all entries
+        return len(doc._delToC())
+
+    # validity checks --------------------------------------------------------
+    if type(toc) not in (list, tuple):
+        raise ValueError("'toc' must be list or tuple")
+    toclen = len(toc)
+    pageCount = doc.pageCount
+    t0 = toc[0]
+    if type(t0) not in (list, tuple):
+        raise ValueError("items must be sequences of 3 or 4 items")
+    if t0[0] != 1:
+        raise ValueError("hierarchy level of item 0 must be 1")
+    for i in list(range(toclen - 1)):
+        t1 = toc[i]
+        t2 = toc[i + 1]
+        if not -1 <= t1[2] <= pageCount:
+            raise ValueError("row %i: page number out of range" % i)
+        if (type(t2) not in (list, tuple)) or len(t2) not in (3, 4):
+            raise ValueError("bad row %i" % (i + 1))
+        if (type(t2[0]) is not int) or t2[0] < 1:
+            raise ValueError("bad hierarchy level in row %i" % (i + 1))
+        if t2[0] > t1[0] + 1:
+            raise ValueError("bad hierarchy level in row %i" % (i + 1))
+    # no formal errors in toc --------------------------------------------------
+
+    # --------------------------------------------------------------------------
+    # make a list of xref numbers, which we can use for our TOC entries
+    # --------------------------------------------------------------------------
+    old_xrefs = doc._delToC()  # del old outlines, get their xref numbers
+    old_xrefs = []  # TODO do not reuse them currently
+    # prepare table of xrefs for new bookmarks
+    xref = [0] + old_xrefs
+    xref[0] = doc._getOLRootNumber()  # entry zero is outline root xref#
+    if toclen > len(old_xrefs):  # too few old xrefs?
+        for i in range((toclen - len(old_xrefs))):
+            xref.append(doc._getNewXref())  # acquire new ones
+
+    lvltab = {0: 0}  # to store last entry per hierarchy level
+
+    # ------------------------------------------------------------------------------
+    # contains new outline objects as strings - first one is the outline root
+    # ------------------------------------------------------------------------------
+    olitems = [{"count": 0, "first": -1, "last": -1, "xref": xref[0]}]
+    # ------------------------------------------------------------------------------
+    # build olitems as a list of PDF-like connnected dictionaries
+    # ------------------------------------------------------------------------------
+    for i in range(toclen):
+        o = toc[i]
+        lvl = o[0]  # level
+        title = getPDFstr(o[1])  # title
+        pno = min(doc.pageCount - 1, max(0, o[2] - 1))  # page number
+        page = doc[pno]  # load the page
+        ictm = ~page.transformationMatrix  # get inverse transformation matrix
+        top = Point(72, 36) * ictm  # default top location
+        dest_dict = {"to": top, "kind": LINK_GOTO}  # fall back target
+        if o[2] < 0:
+            dest_dict["kind"] = LINK_NONE
+        if len(o) > 3:  # some target is specified
+            if type(o[3]) in (int, float):  # convert a number to a point
+                dest_dict["to"] = Point(72, o[3]) * ictm
+            else:  # if something else, make sure we have a dict
+                dest_dict = o[3] if type(o[3]) is dict else dest_dict
+                if "to" not in dest_dict:  # target point not in dict?
+                    dest_dict["to"] = top  # put default in
+                else:  # transform target to PDF coordinates
+                    point = dest_dict["to"] * ictm
+                    dest_dict["to"] = point
+        d = {}
+        d["first"] = -1
+        d["count"] = 0
+        d["last"] = -1
+        d["prev"] = -1
+        d["next"] = -1
+        d["dest"] = getDestStr(page.xref, dest_dict)
+        d["top"] = dest_dict["to"]
+        d["title"] = title
+        d["parent"] = lvltab[lvl - 1]
+        d["xref"] = xref[i + 1]
+        lvltab[lvl] = i + 1
+        parent = olitems[lvltab[lvl - 1]]  # the parent entry
+
+        if collapse and lvl > collapse:  # suppress expansion
+            parent["count"] -= 1  # make /Count negative
+        else:
+            parent["count"] += 1  # positive /Count
+
+        if parent["first"] == -1:
+            parent["first"] = i + 1
+            parent["last"] = i + 1
+        else:
+            d["prev"] = parent["last"]
+            prev = olitems[parent["last"]]
+            prev["next"] = i + 1
+            parent["last"] = i + 1
+        olitems.append(d)
+
+    # ------------------------------------------------------------------------------
+    # now create each outline item as a string and insert it in the PDF
+    # ------------------------------------------------------------------------------
+    for i, ol in enumerate(olitems):
+        txt = "<<"
+        if ol["count"] != 0:
+            txt += "/Count %i" % ol["count"]
+        try:
+            txt += ol["dest"]
+        except:
+            pass
+        try:
+            if ol["first"] > -1:
+                txt += "/First %i 0 R" % xref[ol["first"]]
+        except:
+            pass
+        try:
+            if ol["last"] > -1:
+                txt += "/Last %i 0 R" % xref[ol["last"]]
+        except:
+            pass
+        try:
+            if ol["next"] > -1:
+                txt += "/Next %i 0 R" % xref[ol["next"]]
+        except:
+            pass
+        try:
+            if ol["parent"] > -1:
+                txt += "/Parent %i 0 R" % xref[ol["parent"]]
+        except:
+            pass
+        try:
+            if ol["prev"] > -1:
+                txt += "/Prev %i 0 R" % xref[ol["prev"]]
+        except:
+            pass
+        try:
+            txt += "/Title" + ol["title"]
+        except:
+            pass
+        if i == 0:  # special: this is the outline root
+            txt += "/Type/Outlines"  # so add the /Type entry
+        txt += ">>"
+        doc._updateObject(xref[i], txt)  # insert the PDF object
+
+    doc.initData()
+    return toclen
+
+
+def do_links(doc1, doc2, from_page=-1, to_page=-1, start_at=-1):
+    """Insert links contained in copied page range into destination PDF.
+
+    Parameter values **must** equal those of method insertPDF(), which must
+    have been previously executed.
+    """
+    # --------------------------------------------------------------------------
+    # internal function to create the actual "/Annots" object string
+    # --------------------------------------------------------------------------
+    def cre_annot(lnk, xref_dst, pno_src, ctm):
+        """Create annotation object string for a passed-in link.
+        """
+
+        r = lnk["from"] * ctm  # rect in PDF coordinates
+        rect = "%g %g %g %g" % tuple(r)
+        if lnk["kind"] == LINK_GOTO:
+            txt = annot_skel["goto1"]  # annot_goto
+            idx = pno_src.index(lnk["page"])
+            p = lnk["to"] * ctm  # target point in PDF coordinates
+            annot = txt % (xref_dst[idx], p.x, p.y, rect)
+
+        elif lnk["kind"] == LINK_GOTOR:
+            if lnk["page"] >= 0:
+                txt = annot_skel["gotor1"]  # annot_gotor
+                pnt = lnk.get("to", Point(0, 0))  # destination point
+                if type(pnt) is not Point:
+                    pnt = Point(0, 0)
+                annot = txt % (
+                    lnk["page"],
+                    pnt.x,
+                    pnt.y,
+                    lnk["file"],
+                    lnk["file"],
+                    rect,
+                )
+            else:
+                txt = annot_skel["gotor2"]  # annot_gotor_n
+                to = getPDFstr(lnk["to"])
+                to = to[1:-1]
+                f = lnk["file"]
+                annot = txt % (to, f, rect)
+
+        elif lnk["kind"] == LINK_LAUNCH:
+            txt = annot_skel["launch"]  # annot_launch
+            annot = txt % (lnk["file"], lnk["file"], rect)
+
+        elif lnk["kind"] == LINK_URI:
+            txt = annot_skel["uri"]  # annot_uri
+            annot = txt % (lnk["uri"], rect)
+
+        else:
+            annot = ""
+
+        return annot
+
+    # --------------------------------------------------------------------------
+
+    # validate & normalize parameters
+    if from_page < 0:
+        fp = 0
+    elif from_page >= doc2.pageCount:
+        fp = doc2.pageCount - 1
+    else:
+        fp = from_page
+
+    if to_page < 0 or to_page >= doc2.pageCount:
+        tp = doc2.pageCount - 1
+    else:
+        tp = to_page
+
+    if start_at < 0:
+        raise ValueError("'start_at' must be >= 0")
+    sa = start_at
+
+    incr = 1 if fp <= tp else -1  # page range could be reversed
+
+    # lists of source / destination page numbers
+    pno_src = list(range(fp, tp + incr, incr))
+    pno_dst = [sa + i for i in range(len(pno_src))]
+
+    # lists of source / destination page xrefs
+    xref_src = []
+    xref_dst = []
+    for i in range(len(pno_src)):
+        p_src = pno_src[i]
+        p_dst = pno_dst[i]
+        old_xref = doc2._getPageObjNumber(p_src)[0]
+        new_xref = doc1._getPageObjNumber(p_dst)[0]
+        xref_src.append(old_xref)
+        xref_dst.append(new_xref)
+
+    # create the links for each copied page in destination PDF
+    for i in range(len(xref_src)):
+        page_src = doc2[pno_src[i]]  # load source page
+        links = page_src.getLinks()  # get all its links
+        if len(links) == 0:  # no links there
+            page_src = None
+            continue
+        ctm = ~page_src.transformationMatrix  # calc page transformation matrix
+        page_dst = doc1[pno_dst[i]]  # load destination page
+        link_tab = []  # store all link definitions here
+        for l in links:
+            if l["kind"] == LINK_GOTO and (l["page"] not in pno_src):
+                continue  # GOTO link target not in copied pages
+            annot_text = cre_annot(l, xref_dst, pno_src, ctm)
+            if not annot_text:
+                print("cannot create /Annot for kind: " + str(l["kind"]))
+            else:
+                link_tab.append(annot_text)
+        if len(link_tab) > 0:
+            page_dst._addAnnot_FromString(link_tab)
+        page_dst = None
+        page_src = None
+    return
+
+
+def getLinkText(page, lnk):
+    # --------------------------------------------------------------------------
+    # define skeletons for /Annots object texts
+    # --------------------------------------------------------------------------
+    ctm = page.transformationMatrix
+    ictm = ~ctm
+    r = lnk["from"]
+    height = page.rect.height
+    rect = "%g %g %g %g" % tuple(r * ictm)
+
+    annot = ""
+    if lnk["kind"] == LINK_GOTO:
+        if lnk["page"] >= 0:
+            txt = annot_skel["goto1"]  # annot_goto
+            pno = lnk["page"]
+            xref = page.parent._getPageXref(pno)[0]
+            pnt = lnk.get("to", Point(0, 0))  # destination point
+            ipnt = pnt * ictm
+            annot = txt % (xref, ipnt.x, ipnt.y, rect)
+        else:
+            txt = annot_skel["goto2"]  # annot_goto_n
+            annot = txt % (getPDFstr(lnk["to"]), rect)
+
+    elif lnk["kind"] == LINK_GOTOR:
+        if lnk["page"] >= 0:
+            txt = annot_skel["gotor1"]  # annot_gotor
+            pnt = lnk.get("to", Point(0, 0))  # destination point
+            if type(pnt) is not Point:
+                pnt = Point(0, 0)
+            annot = txt % (lnk["page"], pnt.x, pnt.y, lnk["file"], lnk["file"], rect)
+        else:
+            txt = annot_skel["gotor2"]  # annot_gotor_n
+            annot = txt % (getPDFstr(lnk["to"]), lnk["file"], rect)
+
+    elif lnk["kind"] == LINK_LAUNCH:
+        txt = annot_skel["launch"]  # annot_launch
+        annot = txt % (lnk["file"], lnk["file"], rect)
+
+    elif lnk["kind"] == LINK_URI:
+        txt = annot_skel["uri"]  # txt = annot_uri
+        annot = txt % (lnk["uri"], rect)
+
+    elif lnk["kind"] == LINK_NAMED:
+        txt = annot_skel["named"]  # annot_named
+        annot = txt % (lnk["name"], rect)
+
+    return annot
+
+
+def updateLink(page, lnk):
+    """ Update a link on the current page. """
+    CheckParent(page)
+    annot = getLinkText(page, lnk)
+    if annot == "":
+        raise ValueError("link kind not supported")
+
+    page.parent._updateObject(lnk["xref"], annot, page=page)
+    return
+
+
+def insertLink(page, lnk, mark=True):
+    """ Insert a new link for the current page. """
+    CheckParent(page)
+    annot = getLinkText(page, lnk)
+    if annot == "":
+        raise ValueError("link kind not supported")
+
+    page._addAnnot_FromString([annot])
+    return
+
+
+def insertTextbox(
+    page,
+    rect,
+    buffer,
+    fontname="helv",
+    fontfile=None,
+    set_simple=0,
+    encoding=0,
+    fontsize=11,
+    color=None,
+    fill=None,
+    expandtabs=1,
+    align=0,
+    rotate=0,
+    render_mode=0,
+    border_width=1,
+    morph=None,
+    overlay=True,
+):
+    """ Insert text into a given rectangle.
+
+    Notes:
+        Creates a Shape object, uses its same-named method and commits it.
+    Parameters:
+        rect: (rect-like) area to use for text.
+        buffer: text to be inserted
+        fontname: a Base-14 font, font name or '/name'
+        fontfile: name of a font file
+        fontsize: font size
+        color: RGB color triple
+        expandtabs: handles tabulators with string function
+        align: left, center, right, justified
+        rotate: 0, 90, 180, or 270 degrees
+        morph: morph box with a matrix and a fixpoint
+        overlay: put text in foreground or background
+    Returns:
+        unused or deficit rectangle area (float)
+    """
+    img = page.newShape()
+    rc = img.insertTextbox(
+        rect,
+        buffer,
+        fontsize=fontsize,
+        fontname=fontname,
+        fontfile=fontfile,
+        set_simple=set_simple,
+        encoding=encoding,
+        color=color,
+        fill=fill,
+        expandtabs=expandtabs,
+        render_mode=render_mode,
+        border_width=border_width,
+        align=align,
+        rotate=rotate,
+        morph=morph,
+    )
+    if rc >= 0:
+        img.commit(overlay)
+    return rc
+
+
+def insertText(
+    page,
+    point,
+    text,
+    fontsize=11,
+    fontname="helv",
+    fontfile=None,
+    set_simple=0,
+    encoding=0,
+    color=None,
+    fill=None,
+    border_width=1,
+    render_mode=0,
+    rotate=0,
+    morph=None,
+    overlay=True,
+):
+
+    img = page.newShape()
+    rc = img.insertText(
+        point,
+        text,
+        fontsize=fontsize,
+        fontname=fontname,
+        fontfile=fontfile,
+        set_simple=set_simple,
+        encoding=encoding,
+        color=color,
+        fill=fill,
+        border_width=border_width,
+        render_mode=render_mode,
+        rotate=rotate,
+        morph=morph,
+    )
+    if rc >= 0:
+        img.commit(overlay)
+    return rc
+
+
+def newPage(doc, pno=-1, width=595, height=842):
+    """Create and return a new page object.
+    """
+    doc._newPage(pno, width=width, height=height)
+    return doc[pno]
+
+
+def insertPage(
+    doc,
+    pno,
+    text=None,
+    fontsize=11,
+    width=595,
+    height=842,
+    fontname="helv",
+    fontfile=None,
+    color=None,
+):
+    """ Create a new PDF page and insert some text.
+
+    Notes:
+        Function combining Document.newPage() and Page.insertText().
+        For parameter details see these methods.
+    """
+    page = doc.newPage(pno=pno, width=width, height=height)
+    if not bool(text):
+        return 0
+    rc = page.insertText(
+        (50, 72),
+        text,
+        fontsize=fontsize,
+        fontname=fontname,
+        fontfile=fontfile,
+        color=color,
+    )
+    return rc
+
+
+def drawLine(
+    page,
+    p1,
+    p2,
+    color=None,
+    dashes=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+    morph=None,
+    roundcap=None,
+):
+    """Draw a line from point p1 to point p2.
+    """
+    img = page.newShape()
+    p = img.drawLine(Point(p1), Point(p2))
+    img.finish(
+        color=color,
+        dashes=dashes,
+        width=width,
+        closePath=False,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundcap,
+    )
+    img.commit(overlay)
+
+    return p
+
+
+def drawSquiggle(
+    page,
+    p1,
+    p2,
+    breadth=2,
+    color=None,
+    dashes=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+    morph=None,
+    roundCap=None,
+):
+    """Draw a squiggly line from point p1 to point p2.
+    """
+    img = page.newShape()
+    p = img.drawSquiggle(Point(p1), Point(p2), breadth=breadth)
+    img.finish(
+        color=color,
+        dashes=dashes,
+        width=width,
+        closePath=False,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+    )
+    img.commit(overlay)
+
+    return p
+
+
+def drawZigzag(
+    page,
+    p1,
+    p2,
+    breadth=2,
+    color=None,
+    dashes=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+    morph=None,
+    roundCap=None,
+):
+    """Draw a zigzag line from point p1 to point p2.
+    """
+    img = page.newShape()
+    p = img.drawZigzag(Point(p1), Point(p2), breadth=breadth)
+    img.finish(
+        color=color,
+        dashes=dashes,
+        width=width,
+        closePath=False,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+    )
+    img.commit(overlay)
+
+    return p
+
+
+def drawRect(
+    page,
+    rect,
+    color=None,
+    fill=None,
+    dashes=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    morph=None,
+    roundCap=None,
+    overlay=True,
+):
+    """Draw a rectangle.
+    """
+    img = page.newShape()
+    Q = img.drawRect(Rect(rect))
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+def drawQuad(
+    page,
+    quad,
+    color=None,
+    fill=None,
+    dashes=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    morph=None,
+    roundCap=None,
+    overlay=True,
+):
+    """Draw a quadrilateral.
+    """
+    img = page.newShape()
+    Q = img.drawQuad(Quad(quad))
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+def drawPolyline(
+    page,
+    points,
+    color=None,
+    fill=None,
+    dashes=None,
+    width=1,
+    morph=None,
+    lineCap=0,
+    lineJoin=0,
+    roundCap=None,
+    overlay=True,
+    closePath=False,
+):
+    """Draw multiple connected line segments.
+    """
+    img = page.newShape()
+    Q = img.drawPolyline(points)
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+        closePath=closePath,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+def drawCircle(
+    page,
+    center,
+    radius,
+    color=None,
+    fill=None,
+    morph=None,
+    dashes=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    roundCap=None,
+    overlay=True,
+):
+    """Draw a circle given its center and radius.
+    """
+    img = page.newShape()
+    Q = img.drawCircle(Point(center), radius)
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+    )
+    img.commit(overlay)
+    return Q
+
+
+def drawOval(
+    page,
+    rect,
+    color=None,
+    fill=None,
+    dashes=None,
+    morph=None,
+    roundCap=None,
+    width=1,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+):
+    """Draw an oval given its containing rectangle or quad.
+    """
+    img = page.newShape()
+    Q = img.drawOval(rect)
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+def drawCurve(
+    page,
+    p1,
+    p2,
+    p3,
+    color=None,
+    fill=None,
+    dashes=None,
+    width=1,
+    morph=None,
+    roundCap=None,
+    closePath=False,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+):
+    """Draw a special Bezier curve from p1 to p3, generating control points on lines p1 to p2 and p2 to p3.
+    """
+    img = page.newShape()
+    Q = img.drawCurve(Point(p1), Point(p2), Point(p3))
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+        closePath=closePath,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+def drawBezier(
+    page,
+    p1,
+    p2,
+    p3,
+    p4,
+    color=None,
+    fill=None,
+    dashes=None,
+    width=1,
+    morph=None,
+    roundCap=None,
+    closePath=False,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+):
+    """Draw a general cubic Bezier curve from p1 to p4 using control points p2 and p3.
+    """
+    img = page.newShape()
+    Q = img.drawBezier(Point(p1), Point(p2), Point(p3), Point(p4))
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+        closePath=closePath,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+def drawSector(
+    page,
+    center,
+    point,
+    beta,
+    color=None,
+    fill=None,
+    dashes=None,
+    fullSector=True,
+    morph=None,
+    roundCap=None,
+    width=1,
+    closePath=False,
+    lineCap=0,
+    lineJoin=0,
+    overlay=True,
+):
+    """ Draw a circle sector given circle center, one arc end point and the angle of the arc.
+
+    Parameters:
+        center -- center of circle
+        point -- arc end point
+        beta -- angle of arc (degrees)
+        fullSector -- connect arc ends with center
+    """
+    img = page.newShape()
+    Q = img.drawSector(Point(center), Point(point), beta, fullSector=fullSector)
+    img.finish(
+        color=color,
+        fill=fill,
+        dashes=dashes,
+        width=width,
+        lineCap=lineCap,
+        lineJoin=lineJoin,
+        morph=morph,
+        roundCap=roundCap,
+        closePath=closePath,
+    )
+    img.commit(overlay)
+
+    return Q
+
+
+# ----------------------------------------------------------------------
+# Name:        wx.lib.colourdb.py
+# Purpose:     Adds a bunch of colour names and RGB values to the
+#              colour database so they can be found by name
+#
+# Author:      Robin Dunn
+#
+# Created:     13-March-2001
+# Copyright:   (c) 2001-2017 by Total Control Software
+# Licence:     wxWindows license
+# Tags:        phoenix-port, unittest, documented
+# ----------------------------------------------------------------------
+
+
+def getColorList():
+    """
+    Returns a list of just the colour names used by this module.
+    :rtype: list of strings
+    """
+
+    return [x[0] for x in getColorInfoList()]
+
+
+def getColorInfoList():
+    """
+    Returns the list of colour name/value tuples used by this module.
+    :rtype: list of tuples
+    """
+
+    return [
+        ("ALICEBLUE", 240, 248, 255),
+        ("ANTIQUEWHITE", 250, 235, 215),
+        ("ANTIQUEWHITE1", 255, 239, 219),
+        ("ANTIQUEWHITE2", 238, 223, 204),
+        ("ANTIQUEWHITE3", 205, 192, 176),
+        ("ANTIQUEWHITE4", 139, 131, 120),
+        ("AQUAMARINE", 127, 255, 212),
+        ("AQUAMARINE1", 127, 255, 212),
+        ("AQUAMARINE2", 118, 238, 198),
+        ("AQUAMARINE3", 102, 205, 170),
+        ("AQUAMARINE4", 69, 139, 116),
+        ("AZURE", 240, 255, 255),
+        ("AZURE1", 240, 255, 255),
+        ("AZURE2", 224, 238, 238),
+        ("AZURE3", 193, 205, 205),
+        ("AZURE4", 131, 139, 139),
+        ("BEIGE", 245, 245, 220),
+        ("BISQUE", 255, 228, 196),
+        ("BISQUE1", 255, 228, 196),
+        ("BISQUE2", 238, 213, 183),
+        ("BISQUE3", 205, 183, 158),
+        ("BISQUE4", 139, 125, 107),
+        ("BLACK", 0, 0, 0),
+        ("BLANCHEDALMOND", 255, 235, 205),
+        ("BLUE", 0, 0, 255),
+        ("BLUE1", 0, 0, 255),
+        ("BLUE2", 0, 0, 238),
+        ("BLUE3", 0, 0, 205),
+        ("BLUE4", 0, 0, 139),
+        ("BLUEVIOLET", 138, 43, 226),
+        ("BROWN", 165, 42, 42),
+        ("BROWN1", 255, 64, 64),
+        ("BROWN2", 238, 59, 59),
+        ("BROWN3", 205, 51, 51),
+        ("BROWN4", 139, 35, 35),
+        ("BURLYWOOD", 222, 184, 135),
+        ("BURLYWOOD1", 255, 211, 155),
+        ("BURLYWOOD2", 238, 197, 145),
+        ("BURLYWOOD3", 205, 170, 125),
+        ("BURLYWOOD4", 139, 115, 85),
+        ("CADETBLUE", 95, 158, 160),
+        ("CADETBLUE1", 152, 245, 255),
+        ("CADETBLUE2", 142, 229, 238),
+        ("CADETBLUE3", 122, 197, 205),
+        ("CADETBLUE4", 83, 134, 139),
+        ("CHARTREUSE", 127, 255, 0),
+        ("CHARTREUSE1", 127, 255, 0),
+        ("CHARTREUSE2", 118, 238, 0),
+        ("CHARTREUSE3", 102, 205, 0),
+        ("CHARTREUSE4", 69, 139, 0),
+        ("CHOCOLATE", 210, 105, 30),
+        ("CHOCOLATE1", 255, 127, 36),
+        ("CHOCOLATE2", 238, 118, 33),
+        ("CHOCOLATE3", 205, 102, 29),
+        ("CHOCOLATE4", 139, 69, 19),
+        ("COFFEE", 156, 79, 0),
+        ("CORAL", 255, 127, 80),
+        ("CORAL1", 255, 114, 86),
+        ("CORAL2", 238, 106, 80),
+        ("CORAL3", 205, 91, 69),
+        ("CORAL4", 139, 62, 47),
+        ("CORNFLOWERBLUE", 100, 149, 237),
+        ("CORNSILK", 255, 248, 220),
+        ("CORNSILK1", 255, 248, 220),
+        ("CORNSILK2", 238, 232, 205),
+        ("CORNSILK3", 205, 200, 177),
+        ("CORNSILK4", 139, 136, 120),
+        ("CYAN", 0, 255, 255),
+        ("CYAN1", 0, 255, 255),
+        ("CYAN2", 0, 238, 238),
+        ("CYAN3", 0, 205, 205),
+        ("CYAN4", 0, 139, 139),
+        ("DARKBLUE", 0, 0, 139),
+        ("DARKCYAN", 0, 139, 139),
+        ("DARKGOLDENROD", 184, 134, 11),
+        ("DARKGOLDENROD1", 255, 185, 15),
+        ("DARKGOLDENROD2", 238, 173, 14),
+        ("DARKGOLDENROD3", 205, 149, 12),
+        ("DARKGOLDENROD4", 139, 101, 8),
+        ("DARKGREEN", 0, 100, 0),
+        ("DARKGRAY", 169, 169, 169),
+        ("DARKKHAKI", 189, 183, 107),
+        ("DARKMAGENTA", 139, 0, 139),
+        ("DARKOLIVEGREEN", 85, 107, 47),
+        ("DARKOLIVEGREEN1", 202, 255, 112),
+        ("DARKOLIVEGREEN2", 188, 238, 104),
+        ("DARKOLIVEGREEN3", 162, 205, 90),
+        ("DARKOLIVEGREEN4", 110, 139, 61),
+        ("DARKORANGE", 255, 140, 0),
+        ("DARKORANGE1", 255, 127, 0),
+        ("DARKORANGE2", 238, 118, 0),
+        ("DARKORANGE3", 205, 102, 0),
+        ("DARKORANGE4", 139, 69, 0),
+        ("DARKORCHID", 153, 50, 204),
+        ("DARKORCHID1", 191, 62, 255),
+        ("DARKORCHID2", 178, 58, 238),
+        ("DARKORCHID3", 154, 50, 205),
+        ("DARKORCHID4", 104, 34, 139),
+        ("DARKRED", 139, 0, 0),
+        ("DARKSALMON", 233, 150, 122),
+        ("DARKSEAGREEN", 143, 188, 143),
+        ("DARKSEAGREEN1", 193, 255, 193),
+        ("DARKSEAGREEN2", 180, 238, 180),
+        ("DARKSEAGREEN3", 155, 205, 155),
+        ("DARKSEAGREEN4", 105, 139, 105),
+        ("DARKSLATEBLUE", 72, 61, 139),
+        ("DARKSLATEGRAY", 47, 79, 79),
+        ("DARKTURQUOISE", 0, 206, 209),
+        ("DARKVIOLET", 148, 0, 211),
+        ("DEEPPINK", 255, 20, 147),
+        ("DEEPPINK1", 255, 20, 147),
+        ("DEEPPINK2", 238, 18, 137),
+        ("DEEPPINK3", 205, 16, 118),
+        ("DEEPPINK4", 139, 10, 80),
+        ("DEEPSKYBLUE", 0, 191, 255),
+        ("DEEPSKYBLUE1", 0, 191, 255),
+        ("DEEPSKYBLUE2", 0, 178, 238),
+        ("DEEPSKYBLUE3", 0, 154, 205),
+        ("DEEPSKYBLUE4", 0, 104, 139),
+        ("DIMGRAY", 105, 105, 105),
+        ("DODGERBLUE", 30, 144, 255),
+        ("DODGERBLUE1", 30, 144, 255),
+        ("DODGERBLUE2", 28, 134, 238),
+        ("DODGERBLUE3", 24, 116, 205),
+        ("DODGERBLUE4", 16, 78, 139),
+        ("FIREBRICK", 178, 34, 34),
+        ("FIREBRICK1", 255, 48, 48),
+        ("FIREBRICK2", 238, 44, 44),
+        ("FIREBRICK3", 205, 38, 38),
+        ("FIREBRICK4", 139, 26, 26),
+        ("FLORALWHITE", 255, 250, 240),
+        ("FORESTGREEN", 34, 139, 34),
+        ("GAINSBORO", 220, 220, 220),
+        ("GHOSTWHITE", 248, 248, 255),
+        ("GOLD", 255, 215, 0),
+        ("GOLD1", 255, 215, 0),
+        ("GOLD2", 238, 201, 0),
+        ("GOLD3", 205, 173, 0),
+        ("GOLD4", 139, 117, 0),
+        ("GOLDENROD", 218, 165, 32),
+        ("GOLDENROD1", 255, 193, 37),
+        ("GOLDENROD2", 238, 180, 34),
+        ("GOLDENROD3", 205, 155, 29),
+        ("GOLDENROD4", 139, 105, 20),
+        ("GREEN YELLOW", 173, 255, 47),
+        ("GREEN", 0, 255, 0),
+        ("GREEN1", 0, 255, 0),
+        ("GREEN2", 0, 238, 0),
+        ("GREEN3", 0, 205, 0),
+        ("GREEN4", 0, 139, 0),
+        ("GREENYELLOW", 173, 255, 47),
+        ("GRAY", 190, 190, 190),
+        ("GRAY0", 0, 0, 0),
+        ("GRAY1", 3, 3, 3),
+        ("GRAY10", 26, 26, 26),
+        ("GRAY100", 255, 255, 255),
+        ("GRAY11", 28, 28, 28),
+        ("GRAY12", 31, 31, 31),
+        ("GRAY13", 33, 33, 33),
+        ("GRAY14", 36, 36, 36),
+        ("GRAY15", 38, 38, 38),
+        ("GRAY16", 41, 41, 41),
+        ("GRAY17", 43, 43, 43),
+        ("GRAY18", 46, 46, 46),
+        ("GRAY19", 48, 48, 48),
+        ("GRAY2", 5, 5, 5),
+        ("GRAY20", 51, 51, 51),
+        ("GRAY21", 54, 54, 54),
+        ("GRAY22", 56, 56, 56),
+        ("GRAY23", 59, 59, 59),
+        ("GRAY24", 61, 61, 61),
+        ("GRAY25", 64, 64, 64),
+        ("GRAY26", 66, 66, 66),
+        ("GRAY27", 69, 69, 69),
+        ("GRAY28", 71, 71, 71),
+        ("GRAY29", 74, 74, 74),
+        ("GRAY3", 8, 8, 8),
+        ("GRAY30", 77, 77, 77),
+        ("GRAY31", 79, 79, 79),
+        ("GRAY32", 82, 82, 82),
+        ("GRAY33", 84, 84, 84),
+        ("GRAY34", 87, 87, 87),
+        ("GRAY35", 89, 89, 89),
+        ("GRAY36", 92, 92, 92),
+        ("GRAY37", 94, 94, 94),
+        ("GRAY38", 97, 97, 97),
+        ("GRAY39", 99, 99, 99),
+        ("GRAY4", 10, 10, 10),
+        ("GRAY40", 102, 102, 102),
+        ("GRAY41", 105, 105, 105),
+        ("GRAY42", 107, 107, 107),
+        ("GRAY43", 110, 110, 110),
+        ("GRAY44", 112, 112, 112),
+        ("GRAY45", 115, 115, 115),
+        ("GRAY46", 117, 117, 117),
+        ("GRAY47", 120, 120, 120),
+        ("GRAY48", 122, 122, 122),
+        ("GRAY49", 125, 125, 125),
+        ("GRAY5", 13, 13, 13),
+        ("GRAY50", 127, 127, 127),
+        ("GRAY51", 130, 130, 130),
+        ("GRAY52", 133, 133, 133),
+        ("GRAY53", 135, 135, 135),
+        ("GRAY54", 138, 138, 138),
+        ("GRAY55", 140, 140, 140),
+        ("GRAY56", 143, 143, 143),
+        ("GRAY57", 145, 145, 145),
+        ("GRAY58", 148, 148, 148),
+        ("GRAY59", 150, 150, 150),
+        ("GRAY6", 15, 15, 15),
+        ("GRAY60", 153, 153, 153),
+        ("GRAY61", 156, 156, 156),
+        ("GRAY62", 158, 158, 158),
+        ("GRAY63", 161, 161, 161),
+        ("GRAY64", 163, 163, 163),
+        ("GRAY65", 166, 166, 166),
+        ("GRAY66", 168, 168, 168),
+        ("GRAY67", 171, 171, 171),
+        ("GRAY68", 173, 173, 173),
+        ("GRAY69", 176, 176, 176),
+        ("GRAY7", 18, 18, 18),
+        ("GRAY70", 179, 179, 179),
+        ("GRAY71", 181, 181, 181),
+        ("GRAY72", 184, 184, 184),
+        ("GRAY73", 186, 186, 186),
+        ("GRAY74", 189, 189, 189),
+        ("GRAY75", 191, 191, 191),
+        ("GRAY76", 194, 194, 194),
+        ("GRAY77", 196, 196, 196),
+        ("GRAY78", 199, 199, 199),
+        ("GRAY79", 201, 201, 201),
+        ("GRAY8", 20, 20, 20),
+        ("GRAY80", 204, 204, 204),
+        ("GRAY81", 207, 207, 207),
+        ("GRAY82", 209, 209, 209),
+        ("GRAY83", 212, 212, 212),
+        ("GRAY84", 214, 214, 214),
+        ("GRAY85", 217, 217, 217),
+        ("GRAY86", 219, 219, 219),
+        ("GRAY87", 222, 222, 222),
+        ("GRAY88", 224, 224, 224),
+        ("GRAY89", 227, 227, 227),
+        ("GRAY9", 23, 23, 23),
+        ("GRAY90", 229, 229, 229),
+        ("GRAY91", 232, 232, 232),
+        ("GRAY92", 235, 235, 235),
+        ("GRAY93", 237, 237, 237),
+        ("GRAY94", 240, 240, 240),
+        ("GRAY95", 242, 242, 242),
+        ("GRAY96", 245, 245, 245),
+        ("GRAY97", 247, 247, 247),
+        ("GRAY98", 250, 250, 250),
+        ("GRAY99", 252, 252, 252),
+        ("HONEYDEW", 240, 255, 240),
+        ("HONEYDEW1", 240, 255, 240),
+        ("HONEYDEW2", 224, 238, 224),
+        ("HONEYDEW3", 193, 205, 193),
+        ("HONEYDEW4", 131, 139, 131),
+        ("HOTPINK", 255, 105, 180),
+        ("HOTPINK1", 255, 110, 180),
+        ("HOTPINK2", 238, 106, 167),
+        ("HOTPINK3", 205, 96, 144),
+        ("HOTPINK4", 139, 58, 98),
+        ("INDIANRED", 205, 92, 92),
+        ("INDIANRED1", 255, 106, 106),
+        ("INDIANRED2", 238, 99, 99),
+        ("INDIANRED3", 205, 85, 85),
+        ("INDIANRED4", 139, 58, 58),
+        ("IVORY", 255, 255, 240),
+        ("IVORY1", 255, 255, 240),
+        ("IVORY2", 238, 238, 224),
+        ("IVORY3", 205, 205, 193),
+        ("IVORY4", 139, 139, 131),
+        ("KHAKI", 240, 230, 140),
+        ("KHAKI1", 255, 246, 143),
+        ("KHAKI2", 238, 230, 133),
+        ("KHAKI3", 205, 198, 115),
+        ("KHAKI4", 139, 134, 78),
+        ("LAVENDER", 230, 230, 250),
+        ("LAVENDERBLUSH", 255, 240, 245),
+        ("LAVENDERBLUSH1", 255, 240, 245),
+        ("LAVENDERBLUSH2", 238, 224, 229),
+        ("LAVENDERBLUSH3", 205, 193, 197),
+        ("LAVENDERBLUSH4", 139, 131, 134),
+        ("LAWNGREEN", 124, 252, 0),
+        ("LEMONCHIFFON", 255, 250, 205),
+        ("LEMONCHIFFON1", 255, 250, 205),
+        ("LEMONCHIFFON2", 238, 233, 191),
+        ("LEMONCHIFFON3", 205, 201, 165),
+        ("LEMONCHIFFON4", 139, 137, 112),
+        ("LIGHTBLUE", 173, 216, 230),
+        ("LIGHTBLUE1", 191, 239, 255),
+        ("LIGHTBLUE2", 178, 223, 238),
+        ("LIGHTBLUE3", 154, 192, 205),
+        ("LIGHTBLUE4", 104, 131, 139),
+        ("LIGHTCORAL", 240, 128, 128),
+        ("LIGHTCYAN", 224, 255, 255),
+        ("LIGHTCYAN1", 224, 255, 255),
+        ("LIGHTCYAN2", 209, 238, 238),
+        ("LIGHTCYAN3", 180, 205, 205),
+        ("LIGHTCYAN4", 122, 139, 139),
+        ("LIGHTGOLDENROD", 238, 221, 130),
+        ("LIGHTGOLDENROD1", 255, 236, 139),
+        ("LIGHTGOLDENROD2", 238, 220, 130),
+        ("LIGHTGOLDENROD3", 205, 190, 112),
+        ("LIGHTGOLDENROD4", 139, 129, 76),
+        ("LIGHTGOLDENRODYELLOW", 250, 250, 210),
+        ("LIGHTGREEN", 144, 238, 144),
+        ("LIGHTGRAY", 211, 211, 211),
+        ("LIGHTPINK", 255, 182, 193),
+        ("LIGHTPINK1", 255, 174, 185),
+        ("LIGHTPINK2", 238, 162, 173),
+        ("LIGHTPINK3", 205, 140, 149),
+        ("LIGHTPINK4", 139, 95, 101),
+        ("LIGHTSALMON", 255, 160, 122),
+        ("LIGHTSALMON1", 255, 160, 122),
+        ("LIGHTSALMON2", 238, 149, 114),
+        ("LIGHTSALMON3", 205, 129, 98),
+        ("LIGHTSALMON4", 139, 87, 66),
+        ("LIGHTSEAGREEN", 32, 178, 170),
+        ("LIGHTSKYBLUE", 135, 206, 250),
+        ("LIGHTSKYBLUE1", 176, 226, 255),
+        ("LIGHTSKYBLUE2", 164, 211, 238),
+        ("LIGHTSKYBLUE3", 141, 182, 205),
+        ("LIGHTSKYBLUE4", 96, 123, 139),
+        ("LIGHTSLATEBLUE", 132, 112, 255),
+        ("LIGHTSLATEGRAY", 119, 136, 153),
+        ("LIGHTSTEELBLUE", 176, 196, 222),
+        ("LIGHTSTEELBLUE1", 202, 225, 255),
+        ("LIGHTSTEELBLUE2", 188, 210, 238),
+        ("LIGHTSTEELBLUE3", 162, 181, 205),
+        ("LIGHTSTEELBLUE4", 110, 123, 139),
+        ("LIGHTYELLOW", 255, 255, 224),
+        ("LIGHTYELLOW1", 255, 255, 224),
+        ("LIGHTYELLOW2", 238, 238, 209),
+        ("LIGHTYELLOW3", 205, 205, 180),
+        ("LIGHTYELLOW4", 139, 139, 122),
+        ("LIMEGREEN", 50, 205, 50),
+        ("LINEN", 250, 240, 230),
+        ("MAGENTA", 255, 0, 255),
+        ("MAGENTA1", 255, 0, 255),
+        ("MAGENTA2", 238, 0, 238),
+        ("MAGENTA3", 205, 0, 205),
+        ("MAGENTA4", 139, 0, 139),
+        ("MAROON", 176, 48, 96),
+        ("MAROON1", 255, 52, 179),
+        ("MAROON2", 238, 48, 167),
+        ("MAROON3", 205, 41, 144),
+        ("MAROON4", 139, 28, 98),
+        ("MEDIUMAQUAMARINE", 102, 205, 170),
+        ("MEDIUMBLUE", 0, 0, 205),
+        ("MEDIUMORCHID", 186, 85, 211),
+        ("MEDIUMORCHID1", 224, 102, 255),
+        ("MEDIUMORCHID2", 209, 95, 238),
+        ("MEDIUMORCHID3", 180, 82, 205),
+        ("MEDIUMORCHID4", 122, 55, 139),
+        ("MEDIUMPURPLE", 147, 112, 219),
+        ("MEDIUMPURPLE1", 171, 130, 255),
+        ("MEDIUMPURPLE2", 159, 121, 238),
+        ("MEDIUMPURPLE3", 137, 104, 205),
+        ("MEDIUMPURPLE4", 93, 71, 139),
+        ("MEDIUMSEAGREEN", 60, 179, 113),
+        ("MEDIUMSLATEBLUE", 123, 104, 238),
+        ("MEDIUMSPRINGGREEN", 0, 250, 154),
+        ("MEDIUMTURQUOISE", 72, 209, 204),
+        ("MEDIUMVIOLETRED", 199, 21, 133),
+        ("MIDNIGHTBLUE", 25, 25, 112),
+        ("MINTCREAM", 245, 255, 250),
+        ("MISTYROSE", 255, 228, 225),
+        ("MISTYROSE1", 255, 228, 225),
+        ("MISTYROSE2", 238, 213, 210),
+        ("MISTYROSE3", 205, 183, 181),
+        ("MISTYROSE4", 139, 125, 123),
+        ("MOCCASIN", 255, 228, 181),
+        ("MUPDFBLUE", 37, 114, 172),
+        ("NAVAJOWHITE", 255, 222, 173),
+        ("NAVAJOWHITE1", 255, 222, 173),
+        ("NAVAJOWHITE2", 238, 207, 161),
+        ("NAVAJOWHITE3", 205, 179, 139),
+        ("NAVAJOWHITE4", 139, 121, 94),
+        ("NAVY", 0, 0, 128),
+        ("NAVYBLUE", 0, 0, 128),
+        ("OLDLACE", 253, 245, 230),
+        ("OLIVEDRAB", 107, 142, 35),
+        ("OLIVEDRAB1", 192, 255, 62),
+        ("OLIVEDRAB2", 179, 238, 58),
+        ("OLIVEDRAB3", 154, 205, 50),
+        ("OLIVEDRAB4", 105, 139, 34),
+        ("ORANGE", 255, 165, 0),
+        ("ORANGE1", 255, 165, 0),
+        ("ORANGE2", 238, 154, 0),
+        ("ORANGE3", 205, 133, 0),
+        ("ORANGE4", 139, 90, 0),
+        ("ORANGERED", 255, 69, 0),
+        ("ORANGERED1", 255, 69, 0),
+        ("ORANGERED2", 238, 64, 0),
+        ("ORANGERED3", 205, 55, 0),
+        ("ORANGERED4", 139, 37, 0),
+        ("ORCHID", 218, 112, 214),
+        ("ORCHID1", 255, 131, 250),
+        ("ORCHID2", 238, 122, 233),
+        ("ORCHID3", 205, 105, 201),
+        ("ORCHID4", 139, 71, 137),
+        ("PALEGOLDENROD", 238, 232, 170),
+        ("PALEGREEN", 152, 251, 152),
+        ("PALEGREEN1", 154, 255, 154),
+        ("PALEGREEN2", 144, 238, 144),
+        ("PALEGREEN3", 124, 205, 124),
+        ("PALEGREEN4", 84, 139, 84),
+        ("PALETURQUOISE", 175, 238, 238),
+        ("PALETURQUOISE1", 187, 255, 255),
+        ("PALETURQUOISE2", 174, 238, 238),
+        ("PALETURQUOISE3", 150, 205, 205),
+        ("PALETURQUOISE4", 102, 139, 139),
+        ("PALEVIOLETRED", 219, 112, 147),
+        ("PALEVIOLETRED1", 255, 130, 171),
+        ("PALEVIOLETRED2", 238, 121, 159),
+        ("PALEVIOLETRED3", 205, 104, 137),
+        ("PALEVIOLETRED4", 139, 71, 93),
+        ("PAPAYAWHIP", 255, 239, 213),
+        ("PEACHPUFF", 255, 218, 185),
+        ("PEACHPUFF1", 255, 218, 185),
+        ("PEACHPUFF2", 238, 203, 173),
+        ("PEACHPUFF3", 205, 175, 149),
+        ("PEACHPUFF4", 139, 119, 101),
+        ("PERU", 205, 133, 63),
+        ("PINK", 255, 192, 203),
+        ("PINK1", 255, 181, 197),
+        ("PINK2", 238, 169, 184),
+        ("PINK3", 205, 145, 158),
+        ("PINK4", 139, 99, 108),
+        ("PLUM", 221, 160, 221),
+        ("PLUM1", 255, 187, 255),
+        ("PLUM2", 238, 174, 238),
+        ("PLUM3", 205, 150, 205),
+        ("PLUM4", 139, 102, 139),
+        ("POWDERBLUE", 176, 224, 230),
+        ("PURPLE", 160, 32, 240),
+        ("PURPLE1", 155, 48, 255),
+        ("PURPLE2", 145, 44, 238),
+        ("PURPLE3", 125, 38, 205),
+        ("PURPLE4", 85, 26, 139),
+        ("PY_COLOR", 240, 255, 210),
+        ("RED", 255, 0, 0),
+        ("RED1", 255, 0, 0),
+        ("RED2", 238, 0, 0),
+        ("RED3", 205, 0, 0),
+        ("RED4", 139, 0, 0),
+        ("ROSYBROWN", 188, 143, 143),
+        ("ROSYBROWN1", 255, 193, 193),
+        ("ROSYBROWN2", 238, 180, 180),
+        ("ROSYBROWN3", 205, 155, 155),
+        ("ROSYBROWN4", 139, 105, 105),
+        ("ROYALBLUE", 65, 105, 225),
+        ("ROYALBLUE1", 72, 118, 255),
+        ("ROYALBLUE2", 67, 110, 238),
+        ("ROYALBLUE3", 58, 95, 205),
+        ("ROYALBLUE4", 39, 64, 139),
+        ("SADDLEBROWN", 139, 69, 19),
+        ("SALMON", 250, 128, 114),
+        ("SALMON1", 255, 140, 105),
+        ("SALMON2", 238, 130, 98),
+        ("SALMON3", 205, 112, 84),
+        ("SALMON4", 139, 76, 57),
+        ("SANDYBROWN", 244, 164, 96),
+        ("SEAGREEN", 46, 139, 87),
+        ("SEAGREEN1", 84, 255, 159),
+        ("SEAGREEN2", 78, 238, 148),
+        ("SEAGREEN3", 67, 205, 128),
+        ("SEAGREEN4", 46, 139, 87),
+        ("SEASHELL", 255, 245, 238),
+        ("SEASHELL1", 255, 245, 238),
+        ("SEASHELL2", 238, 229, 222),
+        ("SEASHELL3", 205, 197, 191),
+        ("SEASHELL4", 139, 134, 130),
+        ("SIENNA", 160, 82, 45),
+        ("SIENNA1", 255, 130, 71),
+        ("SIENNA2", 238, 121, 66),
+        ("SIENNA3", 205, 104, 57),
+        ("SIENNA4", 139, 71, 38),
+        ("SKYBLUE", 135, 206, 235),
+        ("SKYBLUE1", 135, 206, 255),
+        ("SKYBLUE2", 126, 192, 238),
+        ("SKYBLUE3", 108, 166, 205),
+        ("SKYBLUE4", 74, 112, 139),
+        ("SLATEBLUE", 106, 90, 205),
+        ("SLATEBLUE1", 131, 111, 255),
+        ("SLATEBLUE2", 122, 103, 238),
+        ("SLATEBLUE3", 105, 89, 205),
+        ("SLATEBLUE4", 71, 60, 139),
+        ("SLATEGRAY", 112, 128, 144),
+        ("SNOW", 255, 250, 250),
+        ("SNOW1", 255, 250, 250),
+        ("SNOW2", 238, 233, 233),
+        ("SNOW3", 205, 201, 201),
+        ("SNOW4", 139, 137, 137),
+        ("SPRINGGREEN", 0, 255, 127),
+        ("SPRINGGREEN1", 0, 255, 127),
+        ("SPRINGGREEN2", 0, 238, 118),
+        ("SPRINGGREEN3", 0, 205, 102),
+        ("SPRINGGREEN4", 0, 139, 69),
+        ("STEELBLUE", 70, 130, 180),
+        ("STEELBLUE1", 99, 184, 255),
+        ("STEELBLUE2", 92, 172, 238),
+        ("STEELBLUE3", 79, 148, 205),
+        ("STEELBLUE4", 54, 100, 139),
+        ("TAN", 210, 180, 140),
+        ("TAN1", 255, 165, 79),
+        ("TAN2", 238, 154, 73),
+        ("TAN3", 205, 133, 63),
+        ("TAN4", 139, 90, 43),
+        ("THISTLE", 216, 191, 216),
+        ("THISTLE1", 255, 225, 255),
+        ("THISTLE2", 238, 210, 238),
+        ("THISTLE3", 205, 181, 205),
+        ("THISTLE4", 139, 123, 139),
+        ("TOMATO", 255, 99, 71),
+        ("TOMATO1", 255, 99, 71),
+        ("TOMATO2", 238, 92, 66),
+        ("TOMATO3", 205, 79, 57),
+        ("TOMATO4", 139, 54, 38),
+        ("TURQUOISE", 64, 224, 208),
+        ("TURQUOISE1", 0, 245, 255),
+        ("TURQUOISE2", 0, 229, 238),
+        ("TURQUOISE3", 0, 197, 205),
+        ("TURQUOISE4", 0, 134, 139),
+        ("VIOLET", 238, 130, 238),
+        ("VIOLETRED", 208, 32, 144),
+        ("VIOLETRED1", 255, 62, 150),
+        ("VIOLETRED2", 238, 58, 140),
+        ("VIOLETRED3", 205, 50, 120),
+        ("VIOLETRED4", 139, 34, 82),
+        ("WHEAT", 245, 222, 179),
+        ("WHEAT1", 255, 231, 186),
+        ("WHEAT2", 238, 216, 174),
+        ("WHEAT3", 205, 186, 150),
+        ("WHEAT4", 139, 126, 102),
+        ("WHITE", 255, 255, 255),
+        ("WHITESMOKE", 245, 245, 245),
+        ("YELLOW", 255, 255, 0),
+        ("YELLOW1", 255, 255, 0),
+        ("YELLOW2", 238, 238, 0),
+        ("YELLOW3", 205, 205, 0),
+        ("YELLOW4", 139, 139, 0),
+        ("YELLOWGREEN", 154, 205, 50),
+    ]
+
+
+def getColorInfoDict():
+    d = {}
+    for item in getColorInfoList():
+        d[item[0].lower()] = item[1:]
+    return d
+
+
+def getColor(name):
+    """Retrieve RGB color in PDF format by name.
+
+    Returns:
+        a triple of floats in range 0 to 1. In case of name-not-found, "white" is returned.
+    """
+    try:
+        c = getColorInfoList()[getColorList().index(name.upper())]
+        return (c[1] / 255.0, c[2] / 255.0, c[3] / 255.0)
+    except:
+        return (1, 1, 1)
+
+
+def getColorHSV(name):
+    """Retrieve the hue, saturation, value triple of a color name.
+
+    Returns:
+        a triple (degree, percent, percent). If not found (-1, -1, -1) is returned.
+    """
+    try:
+        x = getColorInfoList()[getColorList().index(name.upper())]
+    except:
+        return (-1, -1, -1)
+
+    r = x[1] / 255.0
+    g = x[2] / 255.0
+    b = x[3] / 255.0
+    cmax = max(r, g, b)
+    V = round(cmax * 100, 1)
+    cmin = min(r, g, b)
+    delta = cmax - cmin
+    if delta == 0:
+        hue = 0
+    elif cmax == r:
+        hue = 60.0 * (((g - b) / delta) % 6)
+    elif cmax == g:
+        hue = 60.0 * (((b - r) / delta) + 2)
+    else:
+        hue = 60.0 * (((r - g) / delta) + 4)
+
+    H = int(round(hue))
+
+    if cmax == 0:
+        sat = 0
+    else:
+        sat = delta / cmax
+    S = int(round(sat * 100))
+
+    return (H, S, V)
+
+
+def getCharWidths(doc, xref, limit=256, idx=0):
+    """Get list of glyph information of a font.
+
+    Notes:
+        Must be provided by its XREF number. If we already dealt with the
+        font, it will be recorded in doc.FontInfos. Otherwise we insert an
+        entry there.
+        Finally we return the glyphs for the font. This is a list of
+        (glyph, width) where glyph is an integer controlling the char
+        appearance, and width is a float controlling the char's spacing:
+        width * fontsize is the actual space.
+        For 'simple' fonts, glyph == ord(char) will usually be true.
+        Exceptions are 'Symbol' and 'ZapfDingbats'. We are providing data for these directly here.
+    """
+    fontinfo = CheckFontInfo(doc, xref)
+    if fontinfo is None:  # not recorded yet: create it
+        name, ext, stype, _ = doc.extractFont(xref, info_only=True)
+        fontdict = {"name": name, "type": stype, "ext": ext}
+
+        if ext == "":
+            raise ValueError("xref is not a font")
+
+        # check for 'simple' fonts
+        if stype in ("Type1", "MMType1", "TrueType"):
+            simple = True
+        else:
+            simple = False
+
+        # check for CJK fonts
+        if name in ("Fangti", "Ming"):
+            ordering = 0
+        elif name in ("Heiti", "Song"):
+            ordering = 1
+        elif name in ("Gothic", "Mincho"):
+            ordering = 2
+        elif name in ("Dotum", "Batang"):
+            ordering = 3
+        else:
+            ordering = -1
+
+        fontdict["simple"] = simple
+
+        if name == "ZapfDingbats":
+            glyphs = zapf_glyphs
+        elif name == "Symbol":
+            glyphs = symbol_glyphs
+        else:
+            glyphs = None
+
+        fontdict["glyphs"] = glyphs
+        fontdict["ordering"] = ordering
+        fontinfo = [xref, fontdict]
+        doc.FontInfos.append(fontinfo)
+    else:
+        fontdict = fontinfo[1]
+        glyphs = fontdict["glyphs"]
+        simple = fontdict["simple"]
+        ordering = fontdict["ordering"]
+
+    if glyphs is None:
+        oldlimit = 0
+    else:
+        oldlimit = len(glyphs)
+
+    mylimit = max(256, limit)
+
+    if mylimit <= oldlimit:
+        return glyphs
+
+    if ordering < 0:  # not a CJK font
+        glyphs = doc._getCharWidths(
+            xref, fontdict["name"], fontdict["ext"], fontdict["ordering"], mylimit, idx
+        )
+    else:  # CJK fonts use char codes and width = 1
+        glyphs = None
+
+    fontdict["glyphs"] = glyphs
+    fontinfo[1] = fontdict
+    UpdateFontInfo(doc, fontinfo)
+
+    return glyphs
+
+
+class Shape(object):
+    """Create a new shape."""
+
+    @staticmethod
+    def horizontal_angle(C, P):
+        """Return the angle to the horizontal for the connection from C to P.
+        This uses the arcus sine function and resolves its inherent ambiguity by
+        looking up in which quadrant vector S = P - C is located.
+        """
+        S = Point(P - C).unit  # unit vector 'C' -> 'P'
+        alfa = math.asin(abs(S.y))  # absolute angle from horizontal
+        if S.x < 0:  # make arcsin result unique
+            if S.y <= 0:  # bottom-left
+                alfa = -(math.pi - alfa)
+            else:  # top-left
+                alfa = math.pi - alfa
+        else:
+            if S.y >= 0:  # top-right
+                pass
+            else:  # bottom-right
+                alfa = -alfa
+        return alfa
+
+    def __init__(self, page):
+        CheckParent(page)
+        self.page = page
+        self.doc = page.parent
+        if not self.doc.isPDF:
+            raise ValueError("not a PDF")
+        self.height = page.MediaBoxSize.y
+        self.width = page.MediaBoxSize.x
+        self.x = page.CropBoxPosition.x
+        self.y = page.CropBoxPosition.y
+
+        self.pctm = page.transformationMatrix  # page transf. matrix
+        self.ipctm = ~self.pctm  # inverted transf. matrix
+
+        self.draw_cont = ""
+        self.text_cont = ""
+        self.totalcont = ""
+        self.lastPoint = None
+        self.rect = None
+
+    def updateRect(self, x):
+        if self.rect is None:
+            if len(x) == 2:
+                self.rect = Rect(x, x)
+            else:
+                self.rect = Rect(x)
+
+        else:
+            if len(x) == 2:
+                x = Point(x)
+                self.rect.x0 = min(self.rect.x0, x.x)
+                self.rect.y0 = min(self.rect.y0, x.y)
+                self.rect.x1 = max(self.rect.x1, x.x)
+                self.rect.y1 = max(self.rect.y1, x.y)
+            else:
+                x = Rect(x)
+                self.rect.x0 = min(self.rect.x0, x.x0)
+                self.rect.y0 = min(self.rect.y0, x.y0)
+                self.rect.x1 = max(self.rect.x1, x.x1)
+                self.rect.y1 = max(self.rect.y1, x.y1)
+
+    def drawLine(self, p1, p2):
+        """Draw a line between two points.
+        """
+        p1 = Point(p1)
+        p2 = Point(p2)
+        if not (self.lastPoint == p1):
+            self.draw_cont += "%g %g m\n" % JM_TUPLE(p1 * self.ipctm)
+            self.lastPoint = p1
+            self.updateRect(p1)
+
+        self.draw_cont += "%g %g l\n" % JM_TUPLE(p2 * self.ipctm)
+        self.updateRect(p2)
+        self.lastPoint = p2
+        return self.lastPoint
+
+    def drawPolyline(self, points):
+        """Draw several connected line segments.
+        """
+        for i, p in enumerate(points):
+            if i == 0:
+                if not (self.lastPoint == Point(p)):
+                    self.draw_cont += "%g %g m\n" % JM_TUPLE(Point(p) * self.ipctm)
+                    self.lastPoint = Point(p)
+            else:
+                self.draw_cont += "%g %g l\n" % JM_TUPLE(Point(p) * self.ipctm)
+            self.updateRect(p)
+
+        self.lastPoint = Point(points[-1])
+        return self.lastPoint
+
+    def drawBezier(self, p1, p2, p3, p4):
+        """Draw a standard cubic Bezier curve.
+        """
+        p1 = Point(p1)
+        p2 = Point(p2)
+        p3 = Point(p3)
+        p4 = Point(p4)
+        if not (self.lastPoint == p1):
+            self.draw_cont += "%g %g m\n" % JM_TUPLE(p1 * self.ipctm)
+        self.draw_cont += "%g %g %g %g %g %g c\n" % JM_TUPLE(
+            list(p2 * self.ipctm) + list(p3 * self.ipctm) + list(p4 * self.ipctm)
+        )
+        self.updateRect(p1)
+        self.updateRect(p2)
+        self.updateRect(p3)
+        self.updateRect(p4)
+        self.lastPoint = p4
+        return self.lastPoint
+
+    def drawOval(self, tetra):
+        """Draw an ellipse inside a tetrapod.
+        """
+        if len(tetra) != 4:
+            raise ValueError("invalid arg length")
+        if hasattr(tetra[0], "__float__"):
+            q = Rect(tetra).quad
+        else:
+            q = Quad(tetra)
+
+        mt = q.ul + (q.ur - q.ul) * 0.5
+        mr = q.ur + (q.lr - q.ur) * 0.5
+        mb = q.ll + (q.lr - q.ll) * 0.5
+        ml = q.ul + (q.ll - q.ul) * 0.5
+        if not (self.lastPoint == ml):
+            self.draw_cont += "%g %g m\n" % JM_TUPLE(ml * self.ipctm)
+            self.lastPoint = ml
+        self.drawCurve(ml, q.ll, mb)
+        self.drawCurve(mb, q.lr, mr)
+        self.drawCurve(mr, q.ur, mt)
+        self.drawCurve(mt, q.ul, ml)
+        self.updateRect(q.rect)
+        self.lastPoint = ml
+        return self.lastPoint
+
+    def drawCircle(self, center, radius):
+        """Draw a circle given its center and radius.
+        """
+        if not radius > EPSILON:
+            raise ValueError("radius must be postive")
+        center = Point(center)
+        p1 = center - (radius, 0)
+        return self.drawSector(center, p1, 360, fullSector=False)
+
+    def drawCurve(self, p1, p2, p3):
+        """Draw a curve between points using one control point.
+        """
+        kappa = 0.55228474983
+        p1 = Point(p1)
+        p2 = Point(p2)
+        p3 = Point(p3)
+        k1 = p1 + (p2 - p1) * kappa
+        k2 = p3 + (p2 - p3) * kappa
+        return self.drawBezier(p1, k1, k2, p3)
+
+    def drawSector(self, center, point, beta, fullSector=True):
+        """Draw a circle sector.
+        """
+        center = Point(center)
+        point = Point(point)
+        l3 = "%g %g m\n"
+        l4 = "%g %g %g %g %g %g c\n"
+        l5 = "%g %g l\n"
+        betar = math.radians(-beta)
+        w360 = math.radians(math.copysign(360, betar)) * (-1)
+        w90 = math.radians(math.copysign(90, betar))
+        w45 = w90 / 2
+        while abs(betar) > 2 * math.pi:
+            betar += w360  # bring angle below 360 degrees
+        if not (self.lastPoint == point):
+            self.draw_cont += l3 % JM_TUPLE(point * self.ipctm)
+            self.lastPoint = point
+        Q = Point(0, 0)  # just make sure it exists
+        C = center
+        P = point
+        S = P - C  # vector 'center' -> 'point'
+        rad = abs(S)  # circle radius
+
+        if not rad > EPSILON:
+            raise ValueError("radius must be positive")
+
+        alfa = self.horizontal_angle(center, point)
+        while abs(betar) > abs(w90):  # draw 90 degree arcs
+            q1 = C.x + math.cos(alfa + w90) * rad
+            q2 = C.y + math.sin(alfa + w90) * rad
+            Q = Point(q1, q2)  # the arc's end point
+            r1 = C.x + math.cos(alfa + w45) * rad / math.cos(w45)
+            r2 = C.y + math.sin(alfa + w45) * rad / math.cos(w45)
+            R = Point(r1, r2)  # crossing point of tangents
+            kappah = (1 - math.cos(w45)) * 4 / 3 / abs(R - Q)
+            kappa = kappah * abs(P - Q)
+            cp1 = P + (R - P) * kappa  # control point 1
+            cp2 = Q + (R - Q) * kappa  # control point 2
+            self.draw_cont += l4 % JM_TUPLE(
+                list(cp1 * self.ipctm) + list(cp2 * self.ipctm) + list(Q * self.ipctm)
+            )
+
+            betar -= w90  # reduce parm angle by 90 deg
+            alfa += w90  # advance start angle by 90 deg
+            P = Q  # advance to arc end point
+        # draw (remaining) arc
+        if abs(betar) > 1e-3:  # significant degrees left?
+            beta2 = betar / 2
+            q1 = C.x + math.cos(alfa + betar) * rad
+            q2 = C.y + math.sin(alfa + betar) * rad
+            Q = Point(q1, q2)  # the arc's end point
+            r1 = C.x + math.cos(alfa + beta2) * rad / math.cos(beta2)
+            r2 = C.y + math.sin(alfa + beta2) * rad / math.cos(beta2)
+            R = Point(r1, r2)  # crossing point of tangents
+            # kappa height is 4/3 of segment height
+            kappah = (1 - math.cos(beta2)) * 4 / 3 / abs(R - Q)  # kappa height
+            kappa = kappah * abs(P - Q) / (1 - math.cos(betar))
+            cp1 = P + (R - P) * kappa  # control point 1
+            cp2 = Q + (R - Q) * kappa  # control point 2
+            self.draw_cont += l4 % JM_TUPLE(
+                list(cp1 * self.ipctm) + list(cp2 * self.ipctm) + list(Q * self.ipctm)
+            )
+        if fullSector:
+            self.draw_cont += l3 % JM_TUPLE(point * self.ipctm)
+            self.draw_cont += l5 % JM_TUPLE(center * self.ipctm)
+            self.draw_cont += l5 % JM_TUPLE(Q * self.ipctm)
+        self.lastPoint = Q
+        return self.lastPoint
+
+    def drawRect(self, rect):
+        """Draw a rectangle.
+        """
+        r = Rect(rect)
+        self.draw_cont += "%g %g %g %g re\n" % JM_TUPLE(
+            list(r.bl * self.ipctm) + [r.width, r.height]
+        )
+        self.updateRect(r)
+        self.lastPoint = r.tl
+        return self.lastPoint
+
+    def drawQuad(self, quad):
+        """Draw a Quad.
+        """
+        q = Quad(quad)
+        return self.drawPolyline([q.ul, q.ll, q.lr, q.ur, q.ul])
+
+    def drawZigzag(self, p1, p2, breadth=2):
+        """Draw a zig-zagged line from p1 to p2.
+        """
+        p1 = Point(p1)
+        p2 = Point(p2)
+        S = p2 - p1  # vector start - end
+        rad = abs(S)  # distance of points
+        cnt = 4 * int(round(rad / (4 * breadth), 0))  # always take full phases
+        if cnt < 4:
+            raise ValueError("points too close")
+        mb = rad / cnt  # revised breadth
+        matrix = TOOLS._hor_matrix(p1, p2)  # normalize line to x-axis
+        i_mat = ~matrix  # get original position
+        points = []  # stores edges
+        for i in range(1, cnt):
+            if i % 4 == 1:  # point "above" connection
+                p = Point(i, -1) * mb
+            elif i % 4 == 3:  # point "below" connection
+                p = Point(i, 1) * mb
+            else:  # ignore others
+                continue
+            points.append(p * i_mat)
+        self.drawPolyline([p1] + points + [p2])  # add start and end points
+        return p2
+
+    def drawSquiggle(self, p1, p2, breadth=2):
+        """Draw a squiggly line from p1 to p2.
+        """
+        p1 = Point(p1)
+        p2 = Point(p2)
+        S = p2 - p1  # vector start - end
+        rad = abs(S)  # distance of points
+        cnt = 4 * int(round(rad / (4 * breadth), 0))  # always take full phases
+        if cnt < 4:
+            raise ValueError("points too close")
+        mb = rad / cnt  # revised breadth
+        matrix = TOOLS._hor_matrix(p1, p2)  # normalize line to x-axis
+        i_mat = ~matrix  # get original position
+        k = 2.4142135623765633  # y of drawCurve helper point
+
+        points = []  # stores edges
+        for i in range(1, cnt):
+            if i % 4 == 1:  # point "above" connection
+                p = Point(i, -k) * mb
+            elif i % 4 == 3:  # point "below" connection
+                p = Point(i, k) * mb
+            else:  # else on connection line
+                p = Point(i, 0) * mb
+            points.append(p * i_mat)
+
+        points = [p1] + points + [p2]
+        cnt = len(points)
+        i = 0
+        while i + 2 < cnt:
+            self.drawCurve(points[i], points[i + 1], points[i + 2])
+            i += 2
+        return p2
+
+    # ==============================================================================
+    # Shape.insertText
+    # ==============================================================================
+    def insertText(
+        self,
+        point,
+        buffer,
+        fontsize=11,
+        fontname="helv",
+        fontfile=None,
+        set_simple=0,
+        encoding=0,
+        color=None,
+        fill=None,
+        render_mode=0,
+        border_width=1,
+        rotate=0,
+        morph=None,
+    ):
+
+        # ensure 'text' is a list of strings, worth dealing with
+        if not bool(buffer):
+            return 0
+
+        if type(buffer) not in (list, tuple):
+            text = buffer.splitlines()
+        else:
+            text = buffer
+
+        if not len(text) > 0:
+            return 0
+
+        point = Point(point)
+        try:
+            maxcode = max([ord(c) for c in " ".join(text)])
+        except:
+            return 0
+
+        # ensure valid 'fontname'
+        fname = fontname
+        if fname.startswith("/"):
+            fname = fname[1:]
+
+        xref = self.page.insertFont(
+            fontname=fname, fontfile=fontfile, encoding=encoding, set_simple=set_simple
+        )
+        fontinfo = CheckFontInfo(self.doc, xref)
+
+        fontdict = fontinfo[1]
+        ordering = fontdict["ordering"]
+        simple = fontdict["simple"]
+        bfname = fontdict["name"]
+        if maxcode > 255:
+            glyphs = self.doc.getCharWidths(xref, maxcode + 1)
+        else:
+            glyphs = fontdict["glyphs"]
+
+        tab = []
+        for t in text:
+            if simple and bfname not in ("Symbol", "ZapfDingbats"):
+                g = None
+            else:
+                g = glyphs
+            tab.append(getTJstr(t, g, simple, ordering))
+        text = tab
+
+        color_str = ColorCode(color, "c")
+        fill_str = ColorCode(fill, "f")
+        if not fill and render_mode == 0:  # ensure fill color when 0 Tr
+            fill = color
+            fill_str = ColorCode(color, "f")
+
+        morphing = CheckMorph(morph)
+        rot = rotate
+        if rot % 90 != 0:
+            raise ValueError("rotate not multiple of 90")
+
+        while rot < 0:
+            rot += 360
+        rot = rot % 360  # text rotate = 0, 90, 270, 180
+
+        templ1 = "\nq BT\n%s1 0 0 1 %g %g Tm /%s %g Tf "
+        templ2 = "TJ\n0 -%g TD\n"
+        cmp90 = "0 1 -1 0 0 0 cm\n"  # rotates 90 deg counter-clockwise
+        cmm90 = "0 -1 1 0 0 0 cm\n"  # rotates 90 deg clockwise
+        cm180 = "-1 0 0 -1 0 0 cm\n"  # rotates by 180 deg.
+        height = self.height
+        width = self.width
+        lheight = fontsize * 1.2  # line height
+        # setting up for standard rotation directions
+        # case rotate = 0
+        if morphing:
+            m1 = Matrix(1, 0, 0, 1, morph[0].x + self.x, height - morph[0].y - self.y)
+            mat = ~m1 * morph[1] * m1
+            cm = "%g %g %g %g %g %g cm\n" % JM_TUPLE(mat)
+        else:
+            cm = ""
+        top = height - point.y - self.y  # start of 1st char
+        left = point.x + self.x  # start of 1. char
+        space = top  # space available
+        headroom = point.y + self.y  # distance to page border
+        if rot == 90:
+            left = height - point.y - self.y
+            top = -point.x - self.x
+            cm += cmp90
+            space = width - abs(top)
+            headroom = point.x + self.x
+
+        elif rot == 270:
+            left = -height + point.y + self.y
+            top = point.x + self.x
+            cm += cmm90
+            space = abs(top)
+            headroom = width - point.x - self.x
+
+        elif rot == 180:
+            left = -point.x - self.x
+            top = -height + point.y + self.y
+            cm += cm180
+            space = abs(point.y + self.y)
+            headroom = height - point.y - self.y
+
+        nres = templ1 % (cm, left, top, fname, fontsize)
+        if render_mode > 0:
+            nres += "%i Tr " % render_mode
+        if border_width != 1:
+            nres += "%g w " % border_width
+        if color is not None:
+            nres += color_str
+        if fill is not None:
+            nres += fill_str
+
+        # =========================================================================
+        #   start text insertion
+        # =========================================================================
+        nres += text[0]
+        nlines = 1  # set output line counter
+        nres += templ2 % lheight  # line 1
+        for i in range(1, len(text)):
+            if space < lheight:
+                break  # no space left on page
+            if i > 1:
+                nres += "\nT* "
+            nres += text[i] + templ2[:2]
+            space -= lheight
+            nlines += 1
+
+        nres += " ET Q\n"
+
+        # =========================================================================
+        #   end of text insertion
+        # =========================================================================
+        # update the /Contents object
+        self.text_cont += nres
+        return nlines
+
+    # ==============================================================================
+    # Shape.insertTextbox
+    # ==============================================================================
+    def insertTextbox(
+        self,
+        rect,
+        buffer,
+        fontname="helv",
+        fontfile=None,
+        fontsize=11,
+        set_simple=0,
+        encoding=0,
+        color=None,
+        fill=None,
+        expandtabs=1,
+        border_width=1,
+        align=0,
+        render_mode=0,
+        rotate=0,
+        morph=None,
+    ):
+        """ Insert text into a given rectangle.
+
+        Args:
+            rect -- the textbox to fill
+            buffer -- text to be inserted
+            fontname -- a Base-14 font, font name or '/name'
+            fontfile -- name of a font file
+            fontsize -- font size
+            color -- RGB stroke color triple
+            fill -- RGB fill color triple
+            render_mode -- text rendering control
+            border_width -- thickness of glyph borders
+            expandtabs -- handles tabulators with string function
+            align -- left, center, right, justified
+            rotate -- 0, 90, 180, or 270 degrees
+            morph -- morph box with a matrix and a fixpoint
+        Returns:
+            unused or deficit rectangle area (float)
+        """
+        rect = Rect(rect)
+        if rect.isEmpty or rect.isInfinite:
+            raise ValueError("text box must be finite and not empty")
+
+        color_str = ColorCode(color, "c")
+        fill_str = ColorCode(fill, "f")
+        if fill is None and render_mode == 0:  # ensure fill color for 0 Tr
+            fill = color
+            fill_str = ColorCode(color, "f")
+
+        if rotate % 90 != 0:
+            raise ValueError("rotate must be multiple of 90")
+
+        rot = rotate
+        while rot < 0:
+            rot += 360
+        rot = rot % 360
+
+        # is buffer worth of dealing with?
+        if not bool(buffer):
+            return rect.height if rot in (0, 180) else rect.width
+
+        cmp90 = "0 1 -1 0 0 0 cm\n"  # rotates counter-clockwise
+        cmm90 = "0 -1 1 0 0 0 cm\n"  # rotates clockwise
+        cm180 = "-1 0 0 -1 0 0 cm\n"  # rotates by 180 deg.
+        height = self.height
+
+        fname = fontname
+        if fname.startswith("/"):
+            fname = fname[1:]
+
+        xref = self.page.insertFont(
+            fontname=fname, fontfile=fontfile, encoding=encoding, set_simple=set_simple
+        )
+        fontinfo = CheckFontInfo(self.doc, xref)
+
+        fontdict = fontinfo[1]
+        ordering = fontdict["ordering"]
+        simple = fontdict["simple"]
+        glyphs = fontdict["glyphs"]
+        bfname = fontdict["name"]
+
+        # create a list from buffer, split into its lines
+        if type(buffer) in (list, tuple):
+            t0 = "\n".join(buffer)
+        else:
+            t0 = buffer
+
+        maxcode = max([ord(c) for c in t0])
+        # replace invalid char codes for simple fonts
+        if simple and maxcode > 255:
+            t0 = "".join([c if ord(c) < 256 else "?" for c in t0])
+
+        t0 = t0.splitlines()
+
+        glyphs = self.doc.getCharWidths(xref, maxcode + 1)
+        if simple and bfname not in ("Symbol", "ZapfDingbats"):
+            tj_glyphs = None
+        else:
+            tj_glyphs = glyphs
+
+        # ----------------------------------------------------------------------
+        # calculate pixel length of a string
+        # ----------------------------------------------------------------------
+        def pixlen(x):
+            """Calculate pixel length of x."""
+            if ordering < 0:
+                return sum([glyphs[ord(c)][1] for c in x]) * fontsize
+            else:
+                return len(x) * fontsize
+
+        # ----------------------------------------------------------------------
+
+        if ordering < 0:
+            blen = glyphs[32][1] * fontsize  # pixel size of space character
+        else:
+            blen = fontsize
+
+        text = ""  # output buffer
+        lheight = fontsize * 1.2  # line height
+        if CheckMorph(morph):
+            m1 = Matrix(
+                1, 0, 0, 1, morph[0].x + self.x, self.height - morph[0].y - self.y
+            )
+            mat = ~m1 * morph[1] * m1
+            cm = "%g %g %g %g %g %g cm\n" % JM_TUPLE(mat)
+        else:
+            cm = ""
+
+        # ---------------------------------------------------------------------------
+        # adjust for text orientation / rotation
+        # ---------------------------------------------------------------------------
+        progr = 1  # direction of line progress
+        c_pnt = Point(0, fontsize)  # used for line progress
+        if rot == 0:  # normal orientation
+            point = rect.tl + c_pnt  # line 1 is 'lheight' below top
+            pos = point.y + self.y  # y of first line
+            maxwidth = rect.width  # pixels available in one line
+            maxpos = rect.y1 + self.y  # lines must not be below this
+
+        elif rot == 90:  # rotate counter clockwise
+            c_pnt = Point(fontsize, 0)  # progress in x-direction
+            point = rect.bl + c_pnt  # line 1 'lheight' away from left
+            pos = point.x + self.x  # position of first line
+            maxwidth = rect.height  # pixels available in one line
+            maxpos = rect.x1 + self.x  # lines must not be right of this
+            cm += cmp90
+
+        elif rot == 180:  # text upside down
+            c_pnt = -Point(0, fontsize)  # progress upwards in y direction
+            point = rect.br + c_pnt  # line 1 'lheight' above bottom
+            pos = point.y + self.y  # position of first line
+            maxwidth = rect.width  # pixels available in one line
+            progr = -1  # subtract lheight for next line
+            maxpos = rect.y0 + self.y  # lines must not be above this
+            cm += cm180
+
+        else:  # rotate clockwise (270 or -90)
+            c_pnt = -Point(fontsize, 0)  # progress from right to left
+            point = rect.tr + c_pnt  # line 1 'lheight' left of right
+            pos = point.x + self.x  # position of first line
+            maxwidth = rect.height  # pixels available in one line
+            progr = -1  # subtract lheight for next line
+            maxpos = rect.x0 + self.x  # lines must not left of this
+            cm += cmm90
+
+        # =======================================================================
+        # line loop
+        # =======================================================================
+        just_tab = []  # 'justify' indicators per line
+
+        for i, line in enumerate(t0):
+            line_t = line.expandtabs(expandtabs).split(" ")  # split into words
+            lbuff = ""  # init line buffer
+            rest = maxwidth  # available line pixels
+            # ===================================================================
+            # word loop
+            # ===================================================================
+            for word in line_t:
+                pl_w = pixlen(word)  # pixel len of word
+                if rest >= pl_w:  # will it fit on the line?
+                    lbuff += word + " "  # yes, and append word
+                    rest -= pl_w + blen  # update available line space
+                    continue
+                # word won't fit - output line (if not empty)
+                if len(lbuff) > 0:
+                    lbuff = lbuff.rstrip() + "\n"  # line full, append line break
+                    text += lbuff  # append to total text
+                    pos += lheight * progr  # increase line position
+                    just_tab.append(True)  # line is justify candidate
+                    lbuff = ""  # re-init line buffer
+                rest = maxwidth  # re-init avail. space
+                if pl_w <= maxwidth:  # word shorter than 1 line?
+                    lbuff = word + " "  # start the line with it
+                    rest = maxwidth - pl_w - blen  # update free space
+                    continue
+                # long word: split across multiple lines - char by char ...
+                if len(just_tab) > 0:
+                    just_tab[-1] = False  # reset justify indicator
+                for c in word:
+                    if pixlen(lbuff) <= maxwidth - pixlen(c):
+                        lbuff += c
+                    else:  # line full
+                        lbuff += "\n"  # close line
+                        text += lbuff  # append to text
+                        pos += lheight * progr  # increase line position
+                        just_tab.append(False)  # do not justify line
+                        lbuff = c  # start new line with this char
+                lbuff += " "  # finish long word
+                rest = maxwidth - pixlen(lbuff)  # long word stored
+
+            if lbuff != "":  # unprocessed line content?
+                text += lbuff.rstrip()  # append to text
+                just_tab.append(False)  # do not justify line
+            if i < len(t0) - 1:  # not the last line?
+                text += "\n"  # insert line break
+                pos += lheight * progr  # increase line position
+
+        more = (pos - maxpos) * progr  # difference to rect size limit
+
+        if more > EPSILON:  # landed too much outside rect
+            return (-1) * more  # return deficit, don't output
+
+        more = abs(more)
+        if more < EPSILON:
+            more = 0  # don't bother with epsilons
+        nres = "\nq BT\n" + cm  # initialize output buffer
+        templ = "1 0 0 1 %g %g Tm /%s %g Tf "
+        # center, right, justify: output each line with its own specifics
+        spacing = 0
+        text_t = text.splitlines()  # split text in lines again
+        for i, t in enumerate(text_t):
+            pl = maxwidth - pixlen(t)  # length of empty line part
+            pnt = point + c_pnt * (i * 1.2)  # text start of line
+            if align == 1:  # center: right shift by half width
+                if rot in (0, 180):
+                    pnt = pnt + Point(pl / 2, 0) * progr
+                else:
+                    pnt = pnt - Point(0, pl / 2) * progr
+            elif align == 2:  # right: right shift by full width
+                if rot in (0, 180):
+                    pnt = pnt + Point(pl, 0) * progr
+                else:
+                    pnt = pnt - Point(0, pl) * progr
+            elif align == 3:  # justify
+                spaces = t.count(" ")  # number of spaces in line
+                if spaces > 0 and just_tab[i]:  # if any, and we may justify
+                    spacing = pl / spaces  # make every space this much larger
+                else:
+                    spacing = 0  # keep normal space length
+            top = height - pnt.y - self.y
+            left = pnt.x + self.x
+            if rot == 90:
+                left = height - pnt.y - self.y
+                top = -pnt.x - self.x
+            elif rot == 270:
+                left = -height + pnt.y + self.y
+                top = pnt.x + self.x
+            elif rot == 180:
+                left = -pnt.x - self.x
+                top = -height + pnt.y + self.y
+
+            nres += templ % (left, top, fname, fontsize)
+            if render_mode > 0:
+                nres += "%i Tr " % render_mode
+            if spacing != 0:
+                nres += "%g Tw " % spacing
+            if color is not None:
+                nres += color_str
+            if fill is not None:
+                nres += fill_str
+            if border_width != 1:
+                nres += "%g w " % border_width
+            nres += "%sTJ\n" % getTJstr(t, tj_glyphs, simple, ordering)
+
+        nres += "ET Q\n"
+
+        self.text_cont += nres
+        self.updateRect(rect)
+        return more
+
+    def finish(
+        self,
+        width=1,
+        color=None,
+        fill=None,
+        lineCap=0,
+        lineJoin=0,
+        roundCap=None,
+        dashes=None,
+        even_odd=False,
+        morph=None,
+        closePath=True,
+    ):
+        """Finish the current drawing segment.
+
+        Notes:
+            Apply stroke and fill colors, dashes, line style and width, or
+            morphing. Also determines whether any open path should be closed
+            by a connecting line to its start point.
+        """
+        if self.draw_cont == "":  # treat empty contents as no-op
+            return
+        if roundCap is not None:
+            warnings.warn(
+                "roundCap replaced by lineCap / lineJoin and removed in next version",
+                DeprecationWarning,
+            )
+            lineCap = lineJoin = roundCap
+
+        if width == 0:  # border color makes no sense then
+            color = None
+        elif color is None:  # vice versa
+            width = 0
+        color_str = ColorCode(color, "c")  # ensure proper color string
+        fill_str = ColorCode(fill, "f")  # ensure proper fill string
+
+        if width not in (0, 1):
+            self.draw_cont += "%g w\n" % width
+
+        if lineCap + lineJoin > 0:
+            self.draw_cont += "%i J %i j\n" % (lineCap, lineJoin)
+
+        if dashes is not None and len(dashes) > 0:
+            self.draw_cont += "%s d\n" % dashes
+
+        if closePath:
+            self.draw_cont += "h\n"
+            self.lastPoint = None
+
+        if color is not None:
+            self.draw_cont += color_str
+
+        if fill is not None:
+            self.draw_cont += fill_str
+            if color is not None:
+                if not even_odd:
+                    self.draw_cont += "B\n"
+                else:
+                    self.draw_cont += "B*\n"
+            else:
+                if not even_odd:
+                    self.draw_cont += "f\n"
+                else:
+                    self.draw_cont += "f*\n"
+        else:
+            self.draw_cont += "S\n"
+
+        if CheckMorph(morph):
+            m1 = Matrix(
+                1, 0, 0, 1, morph[0].x + self.x, self.height - morph[0].y - self.y
+            )
+            mat = ~m1 * morph[1] * m1
+            self.draw_cont = "%g %g %g %g %g %g cm\n" % JM_TUPLE(mat) + self.draw_cont
+
+        self.totalcont += "\nq\n" + self.draw_cont + "Q\n"
+        self.draw_cont = ""
+        self.lastPoint = None
+        return
+
+    def commit(self, overlay=True):
+        """Update the page's /Contents object with Shape data. The argument controls whether data appear in foreground (default) or background.
+        """
+        CheckParent(self.page)  # doc may have died meanwhile
+        self.totalcont += self.text_cont
+
+        if not fitz_py2:  # need bytes if Python > 2
+            self.totalcont = bytes(self.totalcont, "utf-8")
+
+        if self.totalcont != b"":
+            # make /Contents object with dummy stream
+            xref = TOOLS._insert_contents(self.page, b" ", overlay)
+            # update it with potential compression
+            self.doc.updateStream(xref, self.totalcont)
+
+        self.lastPoint = None  # clean up ...
+        self.rect = None  #
+        self.draw_cont = ""  # for possible ...
+        self.text_cont = ""  # ...
+        self.totalcont = ""  # re-use
+        return
+
+
+def apply_redactions(page):
+    """Apply the redaction annotations of the page.
+    """
+
+    def center_rect(annot_rect, text, font, fsize):
+        """Calculate minimal sub-rectangle for the overlay text.
+
+        Notes:
+            Because 'insertTextbox' supports no vertical text centering,
+            we calculate an approximate number of lines here and return a
+            sub-rect with smaller height, which should still be sufficient.
+        Args:
+            annot_rect: the annotation rectangle
+            text: the text to insert.
+            font: the fontname. Must be one of the CJK or Base-14 set, else
+                the rectangle is returned unchanged.
+            fsize: the fontsize
+        Returns:
+            A rectangle to use instead of the annot rectangle.
+        """
+        if not text:
+            return annot_rect
+        try:
+            text_width = getTextlength(text, font, fsize)
+        except ValueError:  # unsupported font
+            return annot_rect
+        line_height = fsize * 1.2
+        limit = annot_rect.width
+        h = math.ceil(text_width / limit) * line_height  # estimate rect height
+        if h >= annot_rect.height:
+            return annot_rect
+        r = annot_rect
+        y = (annot_rect.tl.y + annot_rect.bl.y - h) * 0.5
+        r.y0 = y
+        return r
+
+    CheckParent(page)
+    doc = page.parent
+    if doc.isEncrypted or doc.isClosed:
+        raise ValueError("document closed or encrypted")
+    if not doc.isPDF:
+        raise ValueError("not a PDF")
+
+    redact_annots = []  # storage of annot values
+    for annot in page.annots(types=(PDF_ANNOT_REDACT,)):  # loop redactions
+        redact_annots.append(annot._get_redact_values())  # save annot values
+
+    if redact_annots == []:  # any redactions on this page?
+        return False  # no redactions
+
+    rc = page._apply_redactions()  # call MuPDF redaction process step
+    if not rc:  # should not happen really
+        raise ValueError("Error applying redactions.")
+
+    # now write replacement text in old redact rectangles
+    shape = page.newShape()
+    for redact in redact_annots:
+        annot_rect = redact["rect"]
+        fill = redact["fill"]
+        if fill:
+            shape.drawRect(annot_rect)  # colorize the rect background
+            shape.finish(fill=fill, color=fill)
+        if "text" in redact.keys():  # if we also have text
+            trect = center_rect(  # try finding vertical centered sub-rect
+                annot_rect, redact["text"], redact["fontname"], redact["fontsize"]
+            )
+            fsize = redact["fontsize"]  # start with stored fontsize
+            rc = -1
+            while rc < 0 and fsize >= 4:  # while not enough room
+                rc = shape.insertTextbox(  # (re-) try insertion
+                    trect,
+                    redact["text"],
+                    fontname=redact["fontname"],
+                    fontsize=fsize,
+                    color=redact["text_color"],
+                    align=redact["align"],
+                )
+                fsize -= 0.5  # reduce font if unsuccessful
+    shape.commit()  # append new contents object
+    return True
+
+
+# ------------------------------------------------------------------------------
+# Remove potentially sensitive data from a PDF. Corresponds to the Adobe
+# Acrobat 'sanitize' function
+# ------------------------------------------------------------------------------
+def scrub(
+    doc,
+    attached_files=True,
+    clean_pages=True,
+    embedded_files=True,
+    hidden_text=True,
+    javascript=True,
+    metadata=True,
+    redactions=True,
+    remove_links=True,
+    reset_fields=True,
+    reset_responses=True,
+    xml_metadata=True,
+):
+    def remove_hidden(cont_lines):
+        """Remove hidden text from a PDF page.
+
+        Args:
+            cont_lines: list of lines with /Contents content. Should have status
+                from after page.cleanContents().
+
+        Returns:
+            List of /Contents lines from which hidden text has been removed.
+
+        Notes:
+            The input must have been created after the page's /Contents object(s)
+            have been cleaned with page.cleanContents(). This ensures a standard
+            formatting: one command per line, no double spaces between operators.
+            This allows for drastic simplification of this code.
+        """
+        out_lines = []  # will return this
+        in_text = False  # indicate if within BT/ET object
+        suppress = False  # indicate text suppression active
+        make_return = False
+        for line in cont_lines:
+            if line == "BT":  # start of text object
+                in_text = True  # switch on
+                out_lines.append(line)  # output it
+                continue
+            if line == "ET":  # end of text object
+                in_text = False  # switch off
+                out_lines.append(line)  # output it
+                continue
+            if line == "3 Tr":  # text suppression operator
+                suppress = True  # switch on
+                make_return = True
+                continue
+            if line[-2:] == "Tr" and line[0] != "3":
+                suppress = False  # text rendering changed
+                out_lines.append(line)
+                continue
+            if line == "Q":  # unstack command also switches off
+                suppress = False
+                out_lines.append(line)
+                continue
+            if suppress and in_text:  # suppress hidden lines
+                continue
+            out_lines.append(line)
+        if make_return:
+            return out_lines
+        else:
+            return None
+
+    if not doc.isPDF:  # only works for PDF
+        ValueError("not a PDF")
+    if doc.isEncrypted or doc.isClosed:
+        ValueError("closed or encrypted doc")
+
+    if clean_pages is False:
+        hidden_text = False
+        redactions = False
+
+    if metadata:
+        doc.setMetadata({})  # remove standard metadata
+
+    if not (xml_metadata or javascript):
+        xref_limit = 0
+    else:
+        xref_limit = doc.xrefLength()
+    for xref in range(1, xref_limit):
+        obj = doc.xrefObject(xref)  # get object definition source
+        # note: this string is formatted in a fixed, standard way by MuPDF.
+
+        if javascript and "/S /JavaScript" in obj:  # a /JavaScript action object?
+            obj = "<</S/JavaScript/JS()>>"  # replace with a null JavaScript
+            doc.updateObject(xref, obj)  # update this object
+            continue  # no further handling
+
+        if not xml_metadata or "/Metadata" not in obj:
+            continue
+
+        if "/Type /Metadata" in obj:  # delete any metadata object directly
+            doc._deleteObject(xref)
+            continue
+
+        obj_lines = obj.splitlines()
+        new_lines = []  # will receive remaining obj definition lines
+        found = False  # assume /Metadata  not found
+        for line in obj_lines:
+            line = line.strip()
+            if not line.startswith("/Metadata "):
+                new_lines.append(line)  # keep this line
+            else:  # drop this line
+                found = True
+        if found:  # if removed /Metadata key, update object definition
+            doc.updateObject(xref, "\n".join(new_lines))
+
+    # remove embedded files
+    if embedded_files:
+        for name in doc.embeddedFileNames():
+            doc.embeddedFileDel(name)
+
+    for page in doc:
+        if reset_fields:
+            # reset form fields (widgets)
+            for widget in page.widgets():
+                widget.reset()
+                widget.update()
+
+        if remove_links:
+            links = page.getLinks()  # list of all links on page
+            for link in links:  # remove all links
+                page.deleteLink(link)
+
+        found_redacts = False
+        for annot in page.annots():
+            if annot.type[0] == PDF_ANNOT_FILEATTACHMENT and attached_files:
+                annot.fileUpd(buffer=b"")  # set file content to empty
+            if reset_responses:
+                annot.delete_responses()
+            if annot.type[0] == PDF_ANNOT_REDACT:
+                found_redacts = True
+
+        if redactions and found_redacts:
+            page.apply_redactions()
+
+        if not page.getContents():  # safeguard against empty /Contents
+            continue
+
+        if not (clean_pages or hidden_text):
+            continue  # done with the page
+
+        page.cleanContents()
+
+        if hidden_text:
+            xref = page.getContents()[0]  # only one b/o cleaning!
+            cont = doc.xrefStream(xref).decode()  # /Contents converted to str
+            cont_lines = remove_hidden(cont.splitlines())  # remove hidden text
+            if cont_lines:  # something was actually removed
+                cont = "\n".join(cont_lines).encode()
+                doc.updateStream(xref, cont)  # rewrite the page /Contents
+
+
+def fillTextbox(
+    writer, rect, text, pos=None, font=None, fontsize=11, align=0, warn=True
+):
+    """Fill a rectangle with text.
+
+    Args:
+        writer: TextWriter object (= "self")
+        text: string or list/tuple of strings.
+        rect: rect-like to receive the text.
+        pos: point-like start position of first word.
+        font: Font object (default Font('helv')).
+        fontsize: the fontsize.
+        align: (int) 0 = left, 1 = center, 2 = right, 3 = justify
+        warn: (bool) just warn on text overflow, else raise exception.
+    """
+    textlen = lambda x: font.text_length(x, fontsize)  # just for abbreviation
+
+    rect = fitz.Rect(rect)
+    if rect.isEmpty or rect.isInfinite:
+        raise ValueError("fill rect must be finite and not empty.")
+
+    if type(font) is not Font:
+        font = Font("helv")
+
+    tolerance = fontsize * 0.25
+    width = rect.width - tolerance  # available horizontal space
+
+    len_space = textlen(" ")  # width of space character
+
+    # starting point of the text
+    if pos is not None:
+        pos = Point(pos)
+        if not pos in rect:
+            raise ValueError("'pos' must be inside 'rect'")
+    else:  # default is just below rect top-left
+        pos = rect.tl + (tolerance, fontsize * 1.3)
+
+    # calculate displacement factor for alignment
+    if align == fitz.TEXT_ALIGN_CENTER:
+        factor = 0.5
+    elif align == fitz.TEXT_ALIGN_RIGHT:
+        factor = 1.0
+    else:
+        factor = 0
+
+    # split in lines if just a string was given
+    if type(text) not in (tuple, list):
+        text = text.splitlines()
+
+    text = " \n".join(text).split(" ")  # split in words, preserve line breaks
+
+    # compute lists of words and word lengths
+    words = []  # recomputed list of words
+    len_words = []  # corresponding lengths
+
+    for word in text:
+        # fill the lists of words and their lengths
+        # this splits words longer than width into chunks, which each are
+        # treated as words themselves.
+        if word.startswith("\n"):
+            len_word = textlen(word[1:])
+        else:
+            len_word = textlen(word)
+        if len_word <= width:  # simple case: word not longer than a line
+            words.append(word)
+            len_words.append(len_word)
+            continue
+        # deal with an extra long word
+        w = word[0]  # start with 1st char
+        l = textlen(w)  # and its length
+        for i in range(1, len(word)):
+            nl = textlen(word[i])  # next char length
+            if l + nl > width:  # if too long
+                words.append(w)  # append what we have so far
+                len_words.append(l)
+                w = word[i]  # start over with new char
+                l = nl  # and its length
+            else:  # if still fitting
+                w += word[i]  # just append char
+                l += nl  # and add its length
+        words.append(w)  # output tail of long word
+        len_words.append(l)  # output length of long word tail
+
+    idx = 0  # index of current word processed
+    line_ctr = 0  # counter for output lines
+    end_idx = len(words)  # number of words
+
+    # -------------------------------------------------------------------------
+    # each loop outputs one line
+    # -------------------------------------------------------------------------
+    while True:
+        if idx >= end_idx:  # all words processed
+            break
+
+        # compute the new insertion point
+        if line_ctr == 0 and len_words[0] >= rect.x1 - pos.x and idx == 0:
+            line_ctr = 1  # first word wont fit in first line: take next one
+
+        if line_ctr == 0:  # first line in rect
+            start = pos
+            width = rect.x1 - pos.x
+        else:
+            start = Point(rect.x0 + tolerance, pos.y + fontsize * 1.3 * line_ctr)
+            width = rect.width - tolerance
+
+        if start.y > rect.y1:  # landed below rectangle area
+            if warn:
+                print("Warning: only fitting %i of %i total words." % (idx, end_idx))
+                break
+            else:
+                raise ValueError("only fitting %i of %i total words." % (idx, end_idx))
+
+        word = words[idx]  # get first word for the line
+        if word.startswith("\n"):  # remove any leading line breaks
+            word = word[1:]
+
+        line = [word]  # list of words fitting in this line
+        len_line = [len_words[idx]]  # list of word lengths
+
+        exhausted = False  # switch indicating we are done
+        justify = True  # enable text justify as default
+        next_words = range(idx + 1, end_idx)  # remaining words in text
+
+        for i in next_words:  # try adding more words to the line
+            nw = words[i]  # next word
+            if nw.startswith("\n"):  # forced line break
+                justify = False  # do not justify this current line
+                break
+            tl = len_space + len_words[i]
+            if tl + sum(len_line) + (len(line) - 1) * len_space > width:  # won't fit
+                break
+            line.append(nw)  # append new word
+            len_line.append(len_words[i])  # add its length
+            if i >= end_idx - 1:  # if we exhausted the words
+                justify = False  # do not justify current line
+                exhausted = True  # and turn on switch
+
+        # finished preparing a line
+        if align != fitz.TEXT_ALIGN_JUSTIFY:  # trivial alignments
+            fin_len = sum(len_line) + (len(line) - 1) * len_space
+            d = (width - fin_len) * factor  # takes care of alignment
+            start.x += d
+            writer.append(start, " ".join(line), font, fontsize)
+        else:  # take care of justified alignment
+            writer.append(start, line[0], font, fontsize)  # always 1st word
+            if len(line) > 1:  # more than one word in the line
+                if justify is False:  # if no justify use space as gap
+                    gap = len_space
+                else:
+                    gap = (width - sum(len_line)) / (len(line) - 1)
+                this_gap = len_line[0] + gap  # gap for 2nd word
+                for j in range(1, len(line)):
+                    writer.append(start + (this_gap, 0), line[j], font, fontsize)
+                    this_gap += len_line[j] + gap  # gap for next word
+
+        if len(next_words) == 0 or exhausted is True:  # no words left
+            break
+
+        idx = i  # number of next word to read
+        line_ctr += 1  # line counter
diff --git a/fitz/version.i b/fitz/version.i

new file mode 100644 (file)

index 0000000..8032634
--- /dev/null
+++ b/fitz/version.i
@@ -0,0 +1,6 @@
+%pythoncode %{
+VersionFitz = "1.17.0"
+VersionBind = "1.17.4"
+VersionDate = "2020-07-20 18:09:40"
+version = (VersionBind, VersionFitz, "20200720180940")
+%}
+\ No newline at end of file
diff --git a/installation/.DS_Store b/installation/.DS_Store

new file mode 100644 (file)

index 0000000..187ca86

Binary files /dev/null and b/installation/.DS_Store differ
diff --git a/installation/centos/centos_pymupdf.sh b/installation/centos/centos_pymupdf.sh

new file mode 100644 (file)

index 0000000..e60ec51
--- /dev/null
+++ b/installation/centos/centos_pymupdf.sh
@@ -0,0 +1,17 @@
+wget https://mupdf.com/downloads/mupdf-1.17.0-source.tar.gz
+tar -zxvf mupdf-1.17.0-source.tar.gz
+
+cd mupdf-1.17.0-source
+export CFLAGS="-fPIC -std=gnu99"
+
+make HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no prefix=/usr/local
+sudo make HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no prefix=/usr/local install
+
+cd ..
+
+rm -rf PyMuPDF
+git clone https://github.com/pymupdf/PyMuPDF.git
+cd PyMuPDF
+
+sudo python setup.py build
+sudo python setup.py install
diff --git a/installation/freebsd/freebsd_pymupdf.sh b/installation/freebsd/freebsd_pymupdf.sh

new file mode 100644 (file)

index 0000000..7fe71ef
--- /dev/null
+++ b/installation/freebsd/freebsd_pymupdf.sh
@@ -0,0 +1,22 @@
+setenv CFLAGS -fPIC
+
+# install the pre-required tool
+pkg install swig30
+
+# Ensure we have a build of the current version
+wget https://mupdf.com/downloads/archive/mupdf-1.17.0-source.tar.gz
+tar -zxvf mupdf-1.17.0-source.tar.gz
+
+rm -rf PyMuPDF
+git clone https://github.com/pymupdf/PyMuPDF.git
+
+cd mupdf-1.17.0-source
+# replace files in mupdf source
+cp ../PyMuPDF/fitz/_config.h include/mupdf/fitz/config.h
+
+gmake HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no prefix=/usr/local
+gmake HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no prefix=/usr/local install
+
+cd ../PyMuPDF
+python setup.py build
+python setup.py install
diff --git a/installation/ubuntu/ubuntu_pymupdf.sh b/installation/ubuntu/ubuntu_pymupdf.sh

new file mode 100644 (file)

index 0000000..973419d
--- /dev/null
+++ b/installation/ubuntu/ubuntu_pymupdf.sh
@@ -0,0 +1,20 @@
+wget https://mupdf.com/downloads/archive/mupdf-1.17.0-source.tar.gz
+tar -zxvf mupdf-1.17.0-source.tar.gz
+
+cd mupdf-1.17.0-source
+
+export CFLAGS="-fPIC"
+# install some prerequirement
+sudo apt install pkg-config python-dev
+
+make HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no prefix=/usr/local
+sudo make HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no prefix=/usr/local install
+
+cd ..
+
+rm -rf PyMuPDF
+git clone https://github.com/pymupdf/PyMuPDF.git
+cd PyMuPDF
+
+sudo python setup.py build
+sudo python setup.py install
diff --git a/setup.py b/setup.py

new file mode 100644 (file)

index 0000000..cf77a17
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,94 @@
+from distutils.core import setup, Extension
+from distutils.command.build_py import build_py as build_py_orig
+import sys, os
+
+# custom build_py command which runs build_ext first
+# this is necessary because build_py needs the fitz.py which is only generated
+# by SWIG in the build_ext step
+class build_ext_first(build_py_orig):
+    def run(self):
+        self.run_command("build_ext")
+        return super().run()
+
+
+# check the platform
+if sys.platform.startswith("linux"):
+    module = Extension(
+        "fitz._fitz",  # name of the module
+        ["fitz/fitz.i"],
+        include_dirs=[  # we need the path of the MuPDF headers
+            "/usr/include/mupdf",
+            "/usr/local/include/mupdf",
+        ],
+        # library_dirs=['<mupdf_and_3rd_party_libraries_dir>'],
+        libraries=[
+            "mupdf",
+            #'crypto', #openssl is required by mupdf on archlinux
+            #'jbig2dec', 'openjp2', 'jpeg', 'freetype',
+            "mupdf-third",
+        ],  # the libraries to link with
+    )
+elif sys.platform.startswith(("darwin", "freebsd")):
+    module = Extension(
+        "fitz._fitz",  # name of the module
+        ["fitz/fitz.i"],
+        # directories containing mupdf's header files
+        include_dirs=["/usr/local/include/mupdf", "/usr/local/include"],
+        # libraries should already be linked here by brew
+        library_dirs=["/usr/local/lib"],
+        # library_dirs=['/usr/local/Cellar/mupdf-tools/1.8/lib/',
+        #'/usr/local/Cellar/openssl/1.0.2g/lib/',
+        #'/usr/local/Cellar/jpeg/8d/lib/',
+        #'/usr/local/Cellar/freetype/2.6.3/lib/',
+        #'/usr/local/Cellar/jbig2dec/0.12/lib/'
+        # ],
+        libraries=["mupdf", "mupdf-third"],
+    )
+
+else:
+    # ===============================================================================
+    # Build / set up PyMuPDF under Windows
+    # ===============================================================================
+    module = Extension(
+        "fitz._fitz",
+        ["fitz/fitz.i"],
+        include_dirs=[  # we need the path of the MuPDF's headers
+            "./mupdf/include",
+            "./mupdf/include/mupdf",
+        ],
+        libraries=[  # these are needed in Windows
+            "libmupdf",
+            "libresources",
+            "libthirdparty",
+        ],
+        extra_link_args=["/NODEFAULTLIB:MSVCRT"],
+        # x86 dir of libmupdf.lib etc.
+        library_dirs=["./mupdf/platform/win32/Release"],
+        # x64 dir of libmupdf.lib etc.
+        # library_dirs=['./mupdf/platform/win32/x64/Release'],
+    )
+
+pkg_tab = open("PKG-INFO").read().split("\n")
+long_dtab = []
+classifier = []
+for l in pkg_tab:
+    if l.startswith("Classifier: "):
+        classifier.append(l[12:])
+        continue
+    if l.startswith(" "):
+        long_dtab.append(l.strip())
+long_desc = "\n".join(long_dtab)
+
+setup(
+    name="PyMuPDF",
+    version="1.17.4",
+    description="Python bindings for the PDF rendering library MuPDF",
+    long_description=long_desc,
+    classifiers=classifier,
+    url="https://github.com/pymupdf/PyMuPDF",
+    author="Jorj McKie, Ruikai Liu",
+    author_email="jorj.x.mckie@outlook.de",
+    cmdclass={"build_py": build_ext_first},
+    ext_modules=[module],
+    py_modules=["fitz.fitz", "fitz.utils", "fitz.__main__"],
+)
author	Bastian Germann <bastiangermann@fishpost.de>
	Fri, 7 Aug 2020 11:03:11 +0000 (12:03 +0100)
committer	Bastian Germann <bastiangermann@fishpost.de>
	Fri, 7 Aug 2020 11:03:11 +0000 (12:03 +0100)
.github/ISSUE_TEMPLATE/bug_report.md	[new file with mode: 0644]	patch \| blob
.github/ISSUE_TEMPLATE/feature_request.md	[new file with mode: 0644]	patch \| blob
.github/ISSUE_TEMPLATE/general-purpose.md	[new file with mode: 0644]	patch \| blob
.gitignore	[new file with mode: 0644]	patch \| blob
.vs/ProjectSettings.json	[new file with mode: 0644]	patch \| blob
.vs/PyMuPDF/v15/.suo	[new file with mode: 0644]	patch \| blob
.vs/PyMuPDF/v15/Browse.VC.db	[new file with mode: 0644]	patch \| blob
.vs/VSWorkspaceState.json	[new file with mode: 0644]	patch \| blob
.vs/slnx.sqlite	[new file with mode: 0644]	patch \| blob
COPYING	[new file with mode: 0644]	patch \| blob
GNU AFFERO GPL V3	[new file with mode: 0644]	patch \| blob
PKG-INFO	[new file with mode: 0644]	patch \| blob
README.md	[new file with mode: 0644]	patch \| blob
demo/pymupdf.jpg	[new file with mode: 0644]	patch \| blob
docs/PyMuPDF.ico	[new file with mode: 0644]	patch \| blob
docs/algebra.rst	[new file with mode: 0644]	patch \| blob
docs/annot.rst	[new file with mode: 0644]	patch \| blob
docs/app1.rst	[new file with mode: 0644]	patch \| blob
docs/app2.rst	[new file with mode: 0644]	patch \| blob
docs/app3.rst	[new file with mode: 0644]	patch \| blob
docs/app4.rst	[new file with mode: 0644]	patch \| blob
docs/changes.rst	[new file with mode: 0644]	patch \| blob
docs/classes.rst	[new file with mode: 0644]	patch \| blob
docs/colors.rst	[new file with mode: 0644]	patch \| blob
docs/colorspace.rst	[new file with mode: 0644]	patch \| blob
docs/conf.py	[new file with mode: 0644]	patch \| blob
docs/coop_low.rst	[new file with mode: 0644]	patch \| blob
docs/device.rst	[new file with mode: 0644]	patch \| blob
docs/displaylist.rst	[new file with mode: 0644]	patch \| blob
docs/document.rst	[new file with mode: 0644]	patch \| blob
docs/faq.rst	[new file with mode: 0644]	patch \| blob
docs/font.rst	[new file with mode: 0644]	patch \| blob
docs/functions.rst	[new file with mode: 0644]	patch \| blob
docs/glossary.rst	[new file with mode: 0644]	patch \| blob
docs/identity.rst	[new file with mode: 0644]	patch \| blob
docs/images/img-4up.png	[new file with mode: 0644]	patch \| blob
docs/images/img-7edges.png	[new file with mode: 0644]	patch \| blob
docs/images/img-a-is--1.png	[new file with mode: 0644]	patch \| blob
docs/images/img-adobe.png	[new file with mode: 0644]	patch \| blob
docs/images/img-alpha-0.png	[new file with mode: 0644]	patch \| blob
docs/images/img-alpha-1.png	[new file with mode: 0644]	patch \| blob
docs/images/img-annots.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-attach-result.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-b-is-0.5.png	[new file with mode: 0644]	patch \| blob
docs/images/img-binsetupdirs.png	[new file with mode: 0644]	patch \| blob
docs/images/img-breadth.png	[new file with mode: 0644]	patch \| blob
docs/images/img-c-is-0.5.png	[new file with mode: 0644]	patch \| blob
docs/images/img-cake.png	[new file with mode: 0644]	patch \| blob
docs/images/img-caret-annot.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-circle.png	[new file with mode: 0644]	patch \| blob
docs/images/img-clip.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-colordb.png	[new file with mode: 0644]	patch \| blob
docs/images/img-copy-speed-1.png	[new file with mode: 0644]	patch \| blob
docs/images/img-copy-speed-2.png	[new file with mode: 0644]	patch \| blob
docs/images/img-d-is--1.png	[new file with mode: 0644]	patch \| blob
docs/images/img-drawBezier.png	[new file with mode: 0644]	patch \| blob
docs/images/img-drawCurve.png	[new file with mode: 0644]	patch \| blob
docs/images/img-drawSector1.png	[new file with mode: 0644]	patch \| blob
docs/images/img-drawSector2.png	[new file with mode: 0644]	patch \| blob
docs/images/img-drawcircle.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-drawquad.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-e-is-100.png	[new file with mode: 0644]	patch \| blob
docs/images/img-embed-progress.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-encoding.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-encrypting.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-even-odd.png	[new file with mode: 0644]	patch \| blob
docs/images/img-extract-imga.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-extract-imgb.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-f-is-100.png	[new file with mode: 0644]	patch \| blob
docs/images/img-filesizes.png	[new file with mode: 0644]	patch \| blob
docs/images/img-freetext.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-import-progress.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-inkannot.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-inserttext.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-markedpdf.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-markers.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-matrix.png	[new file with mode: 0644]	patch \| blob
docs/images/img-opacity.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-original.png	[new file with mode: 0644]	patch \| blob
docs/images/img-pdfjoiner.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-pdftext.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-planish.png	[new file with mode: 0644]	patch \| blob
docs/images/img-point-unit.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-polyline.png	[new file with mode: 0644]	patch \| blob
docs/images/img-posterize.png	[new file with mode: 0644]	patch \| blob
docs/images/img-pymupdf.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-quads.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-redact.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-render-speed.png	[new file with mode: 0644]	patch \| blob
docs/images/img-rendermode.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-rot+morph.png	[new file with mode: 0644]	patch \| blob
docs/images/img-rot-60.png	[new file with mode: 0644]	patch \| blob
docs/images/img-rotate.png	[new file with mode: 0644]	patch \| blob
docs/images/img-showpdfpage.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-sierpinski.png	[new file with mode: 0644]	patch \| blob
docs/images/img-squiggly.png	[new file with mode: 0644]	patch \| blob
docs/images/img-stampannot.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-stencil.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-symbols.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-target.png	[new file with mode: 0644]	patch \| blob
docs/images/img-textbox.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-textboxtract.png	[new file with mode: 0644]	patch \| blob
docs/images/img-textmarker.jpg	[new file with mode: 0644]	patch \| blob
docs/images/img-textmethods.png	[new file with mode: 0644]	patch \| blob
docs/images/img-textpage-char.png	[new file with mode: 0644]	patch \| blob
docs/images/img-textpage.png	[new file with mode: 0644]	patch \| blob
docs/images/img-textperformance.png	[new file with mode: 0644]	patch \| blob
docs/images/img-timings.png	[new file with mode: 0644]	patch \| blob
docs/images/img-writeimage.png	[new file with mode: 0644]	patch \| blob
docs/images/mupdf-icons.jpg	[new file with mode: 0644]	patch \| blob
docs/index.rst	[new file with mode: 0644]	patch \| blob
docs/installation.rst	[new file with mode: 0644]	patch \| blob
docs/intro.rst	[new file with mode: 0644]	patch \| blob
docs/irect.rst	[new file with mode: 0644]	patch \| blob
docs/kerning.style	[new file with mode: 0644]	patch \| blob
docs/link.rst	[new file with mode: 0644]	patch \| blob
docs/linkdest.rst	[new file with mode: 0644]	patch \| blob
docs/lowlevel.rst	[new file with mode: 0644]	patch \| blob
docs/make-bold.py	[new file with mode: 0644]	patch \| blob
docs/matrix.rst	[new file with mode: 0644]	patch \| blob
docs/module.rst	[new file with mode: 0644]	patch \| blob
docs/multiprocess-gui.py	[new file with mode: 0644]	patch \| blob
docs/multiprocess-render.py	[new file with mode: 0644]	patch \| blob
docs/new-annots.py	[new file with mode: 0644]	patch \| blob
docs/outline.rst	[new file with mode: 0644]	patch \| blob
docs/page.rst	[new file with mode: 0644]	patch \| blob
docs/pixmap.rst	[new file with mode: 0644]	patch \| blob
docs/point.rst	[new file with mode: 0644]	patch \| blob
docs/pymupdf-logo.jpg	[new file with mode: 0644]	patch \| blob
docs/quad.rst	[new file with mode: 0644]	patch \| blob
docs/rect.rst	[new file with mode: 0644]	patch \| blob
docs/replace-fonts.py	[new file with mode: 0644]	patch \| blob
docs/shape.rst	[new file with mode: 0644]	patch \| blob
docs/text-lister.py	[new file with mode: 0644]	patch \| blob
docs/textpage.rst	[new file with mode: 0644]	patch \| blob
docs/textwriter.rst	[new file with mode: 0644]	patch \| blob
docs/tools.rst	[new file with mode: 0644]	patch \| blob
docs/tutorial.rst	[new file with mode: 0644]	patch \| blob
docs/vars.rst	[new file with mode: 0644]	patch \| blob
docs/version.rst	[new file with mode: 0644]	patch \| blob
docs/wheelnames.txt	[new file with mode: 0644]	patch \| blob
docs/widget.rst	[new file with mode: 0644]	patch \| blob
fitz/__init__.py	[new file with mode: 0644]	patch \| blob
fitz/__main__.py	[new file with mode: 0644]	patch \| blob
fitz/fitz.i	[new file with mode: 0644]	patch \| blob
fitz/helper-annot.i	[new file with mode: 0644]	patch \| blob
fitz/helper-convert.i	[new file with mode: 0644]	patch \| blob
fitz/helper-defines.i	[new file with mode: 0644]	patch \| blob
fitz/helper-fields.i	[new file with mode: 0644]	patch \| blob
fitz/helper-geo-c.i	[new file with mode: 0644]	patch \| blob
fitz/helper-geo-py.i	[new file with mode: 0644]	patch \| blob
fitz/helper-other.i	[new file with mode: 0644]	patch \| blob
fitz/helper-pdfinfo.i	[new file with mode: 0644]	patch \| blob
fitz/helper-pixmap.i	[new file with mode: 0644]	patch \| blob
fitz/helper-portfolio.i	[new file with mode: 0644]	patch \| blob
fitz/helper-python.i	[new file with mode: 0644]	patch \| blob
fitz/helper-select.i	[new file with mode: 0644]	patch \| blob
fitz/helper-stext.i	[new file with mode: 0644]	patch \| blob
fitz/helper-xobject.i	[new file with mode: 0644]	patch \| blob
fitz/utils.py	[new file with mode: 0644]	patch \| blob
fitz/version.i	[new file with mode: 0644]	patch \| blob
installation/.DS_Store	[new file with mode: 0644]	patch \| blob
installation/centos/centos_pymupdf.sh	[new file with mode: 0644]	patch \| blob
installation/freebsd/freebsd_pymupdf.sh	[new file with mode: 0644]	patch \| blob
installation/ubuntu/ubuntu_pymupdf.sh	[new file with mode: 0644]	patch \| blob
setup.py	[new file with mode: 0644]	patch \| blob