Imported Upstream version 3.002

author gregor herrmann <gregoa@debian.org>

Wed, 27 Aug 2014 02:35:27 +0000 (19:35 -0700)

committer gregor herrmann <gregoa@debian.org>

Wed, 27 Aug 2014 02:35:27 +0000 (19:35 -0700)
author gregor herrmann <gregoa@debian.org>
Wed, 27 Aug 2014 02:35:27 +0000 (19:35 -0700)
committer gregor herrmann <gregoa@debian.org>
Wed, 27 Aug 2014 02:35:27 +0000 (19:35 -0700)
diff --git a/Changes b/Changes

index 993c892ee445e7c085b02c00f5e9e8869c21b5f0..57427884a1997cd86b639203655992de9a6091ba 100644 (file)
--- a/Changes
+++ b/Changes
@@ -3,60 +3,26 @@ Revision history for Perl extension Sereal-Encoder
  * Warning: For a seamless upgrade, upgrade to version 3
  *          of the decoder before upgrading to version 3 of the
  *          encoder!
-3.001_011 Tues, Aug 12 2014
-  - Remove use of defined-or in t/lib/TestSet.pm
  
-3.001_010 Tues, Aug 12 2014
-  - Cleanup and enhance the "alternates" testing in t/010_desperate.t
+3.002 Aug 20 2014
+  Summary of changes from 3.001 - 3.002
+  - Introduce "canonical" option to encoder
+  - Introduce "canonical_refs" option to encoder
  
-3.001_007 .. 3.001_009
-  - Try to fix t/010_desperate.t on threaded perls (yes that many releases. sigh)
+  * Test Infra Changes
+  - Split up bulk tests to speed up testing and make it easier
+    to see when a failure is restricted to a specific option.
  
-3.001_006 Sun, Aug 03 2014
-  - Rework bulk tests so we test more, but report less tests.
-    The test infrastructure doesn't play well with lots of tests
-    in a file. Similarly, if we fail one of the methods in the bulk
-    tests we stop testing the rest.
-  - Add a canonical mode to the encoder.
-  - More tests.
-
-3.001_005 Mon, July 28 2014
-  - Fixup how MakeMaker runs the tests.
-
-3.001_004 Sun, July 27 2014
-  - Rework bulk tests so that tests are grouped by options and version
-  - Fixups for non-x86 architectures.
-
-3.001_003
-3.001_002
-  - Attempts to fix builds on sparc, s390x and ARM.
-
-3.001_001
-  - Patches from Jarkko Hietaniemi to make Sereal pass test
-    on HP-UX, and other machines with endian or alignedness
-    issues. Thanks to H.Merijn Brand for assisting and providing
-    access to test machines.
+  * Big-Endian Support
+  - Improved support for Big-Endian machines. We now build and pass test
+    on Sparc and HP-UX and other platforms with big-endian or strict
+    alignedness requirements. Much thanks to Jarkko Hietaniemi,
+    Gregor Herrmann, and H. Merijn Brand for for their assistance with
+    this.
+  - We still have issues with s390x (Z/Os) with Sereal. If someone wants
+    to help it would be appreciated.
  
  3.001
-  - Production release 1 of protocol version 3
-  - Zlib support
-  - CANONICAL_UNDEF,
-  - new magic header to make it easier to detect
-    UTF8 encoded data.
-  - Minor changes to how scalar values are serialized
-    to favour more compact representations.
-
-3.000_004
-  - Fix issues in new serialization rules with tied arrays
-    on older Perls revealed by breakage in t/400_evil.t
-
-3.000_003
-  - Sync release with decoder.
-
-3.000_002
-  - Minor protocol changes to magic header definition.
-
-3.000_001
    - Upgrade to version 3 of the protocol
      * Add Zlib compression support to the protocol
      * Add Zlib support to Encoder/Decoder
@@ -70,285 +36,5 @@ Revision history for Perl extension Sereal-Encoder
      (this is to handle engineering notation like "0E0"
      where numeric and string equivalence may differ)
  
-2.12 Sun May 11 23:30
-  - Synchronization release with Decoder changes.
-
-2.11 Sun Apr 13 23:04
-  - Work around regression in Perl 5.16.3 - 5.17.0
-    As of 8ae39f603f0f5778c160e18e08df60 while each
-    automagically becomes while $_= defined(each);
-    which manages to break some of our test code.
-
-2.10 Sun Apr 13 21:30
-  - Fix broken MANIFEST
-
-2.09 Sun Apr 13 21:15
-  - Synchro release with Decoder change.
-
-2.08 Thu Apr 10 22:10 2013
-  - Production release for previous changes.
-
-2.070_103 Wed Apr 09 00:33 2013 * DEV RELEASE *
-  - Synchronization release with Decoder. No changes.
-
-2.070_102 Sun Apr 06 17:27 2013 * DEV RELEASE *
-  - Fixes for how we load XS so Sereal.pm works properly
-    with dev releases.
-
-2.070_101 Sun Apr 06 17:27 2013 * DEV RELEASE *
-  - Fix for newer perls.
-  - Changes to 'fixver.pl' and version numbering so we do
-    a 3 digit minor version, and a 3 digit dev version,
-    so once this dev release cycle is done we will be at
-    v2.071 everywhere. This eliminates a version numbering
-    inconsistency in Sereal.pm from Encoder.pm and Decoder.pm
-
-2.07_01 Wed Mar 26 18:10 2014 * DEV RELEASE *
-  - Fix for aliased_dedupe_strings feature (Borislav Nikolov)
-  - Add sereal_decode_with_object(), a functional/custom-opcode
-    implementation of the OO interface, with much less overhead.
-    In practice this will make a very modest impact on dumping,
-    but if your applications needs it...
-    Thanks to Zefram for the custom op implementation.
-  - Optimize dumping hashes by being more careful how we
-    check if they have backreferences, and avoid creating
-    a HvAUX() structure (and thus reallocing the hashes bucket
-    array) just to find out if they have backreferences.
-    Reported by Steffen.
-
-2.06 Sun Mar  0 11:40 2014 (AMS time)
-  - Only minor changes.
-
-2.05 Fri Mar  7 10:30 2014 (AMS time)
-  - Fix rt.cpan.org #93560 - Encoder object wasn't re-entrant from
-    FREEZE calls.
-
-2.04 Wed Mar  5 18:15 2014 (AMS time)
-  - Fix rt.cpan.org #93484 - fencepost error in Encoder.xs (Zefram)
-
-2.03 Tue Jan  7 20:00 2014 (AMS time)
-  - (Hopefully) final fixes to FREEZE/THAW functionality:
-    => Add safe assertion to make sure that we don't segfault on invalid
-       data.
-    => Fix encoding/decoding of data structures with repeated references
-       to the same instance of a class that has FREEZE/THAW hooks.
-       Thanks to Christian Hansen for a test case.
-  - Distribution dependency fix.
-
-2.02 Mon Jan  6 15:00 2014 (AMS time)
-  - Fundamental fixes for FREEZE/THAW support in previous Sereal v2
-    releases. If you plan to use FREEZE/THAW, make sure you have 2.02
-    or better (dito for the decoder).
-
-2.01 Tue Dec 31 08:15 2013 (AMS time)
-  - Promoting changes from 0.37 to 2.00_03 to a stable release.
-    (This being the first protocol v2 stable release.)
-  - Minor performance tweaks.
-
-2.00_03 Sun Dec 29 10:33 2013 (AMS time)
-  - FREEZE/THAW hooks for object serialization.
-  - Test improvements (allowing for partial parallel run)
-  - Minor optimizations.
-
-2.00_02 Mon Oct 28 19:32 2013 (AMS time)
-  - Sereal::Encoder now requires Sereal::Decoder for better testing.
-  - Fix Test::Warn dependency problem of 2.00_01.
-
-2.00_01 tue Oct 1 07:34 2013 (AMS time)
-  - NEW PROTOCOL VERSION: V2
-  - User-data in header functionality: You may embed arbitrary
-    Sereal-serializable data in a document header. The document
-    header isn't compressed, so this is ideal for retrieving
-    small chunks of meta-data (eg. routing information) without
-    having to deserialize the entire document.
-  - Relocatable Sereal document bodies
-  - Encoder never emits non-incremental Snappy encoding for V2
-  - Offsets now 1-based in relocatable format, not 0
-  - Fixed VERY obscure (and rare) memory leak.
-  - Improved error messages
-  - Remove warning about Sereal not being production-grade
-    (because it IS).
-  - Detect when the Snappy compression was net negative in size
-    and back out
-  - C89/Windows fixes (bulk88)
-  - 5.18 compat: Skip test failing due to hash-randomization (Zefram)
-
-0.37 Mon Sep 2 07:40 2013 (AMS time)
-  - Windows and C89 fixes
-  - Band-aid: Skip test failing due to hash-randomization [Zefram]
-
-0.36 Tue May 7 12:00 2013 (AMS time)
-  [changelog for encoder and decoder both]
-  - Add "incremental" option to decoder for easier decoding of
-    multiple sereal documents in one buffer.
-  - Make snappy and snappy_incr options mutually exclusive.
-  - Feature: Implement aliasing for deduping (aliased_dedupe_strings)
-
-0.35 Mon Apr 1 11:50 2013 (AMS time)
-  - Add new no_bless_objects option from Simon Bertrang.
-
-0.34 Sat Mar 23 18:59:18 2013 (AMS time)
-  - Fixup Manifest
-
-0.33 Sun Feb 17 17:26 2013 (AMS time)
-  - Fix problem with hv_backrefs (Issue #27)
-
-0.32 Sun Feb 17 15:06 2013 (AMS time)
-  - Add "dedupe_strings" option, which will make
-    the encoder do extra work to dedupe string values
-    in the serialized output.
-
-0.31 Sun Feb 17 15:06 2013 (AMS time)
-  - Daniel Dragan <bulk88@hotmail.com> spent a bunch of time
-    digging into the weird problems we were having with Snappy
-    encoded data on Windows on certain builds. Turned out that
-    it was right broken, and worked sometimes purely by chance.
-    He kindly provided a patch.
-
-0.30 Wed Feb 13 06:21 2013 (AMS time)
-  - Found a work around for VC6 Windows 32 bit builds
-    Compile was "optimizing" float comparisons to use 80 bit precision
-    regardless of type, this release uses a workaround of marking the
-    relevant vars "volatile".
-
-0.29 Sat Feb 09 18:09 2013 (AMS time)
-  - Dummy release to keep Encoder in sync with Decoder.
-
-0.28 Sat Feb 09 16:20 2013 (AMS time)
-  - More fixups for building on Win32/C89 compilers
-  - Eliminate unnecessary use of strlen.
-
-0.27 Sat Feb 09 12:58 2013 (AMS time)
-  - Various fixups to improve building on Win32
-  - Fix C89 violations
-
-0.26 Sun Feb 03 13:45 2013
-  - Compatibility with perl 5.17.6 (5.18-to-be) regarding regular
-    expression encoding.
-  - Fixed Changelog order (why would I ever have listed oldest first?)
-
-0.25 Tue Jan 22 18:00 2013
-  - Various compatibility fixes with old versions of Perl.
-    Specifically, fixes to regular expression handling that should help
-    with 5.10 support, as well as fixes that should improve the status
-    quo on 5.8.
-  - Potential fix for a leak wrt. regular expression support.
-  - Fewer compiler warnings on 32bit/gcc.
-
-0.24 - unreleased
-
-0.23 Tue Jan 08 07:23 2013
-  * Important bug fix release *
-  * Warning *
-    Before using the incremental Snappy mode described below, you must
-    upgrade the Sereal::Decoder to version 0.23 or higher!
-  - Support for the 'snappy_incr' option, which uses a new Snappy
-    compression format that is suitable for parsing multiple Sereal
-    documents from a large buffer. A bug in the previous implementation
-    of Snappy-compression resulted in the Decoder failing if the
-    buffer (Perl input string) extended beyond the length of the
-    Snappy-compressed Sereal document.
-    If this confuses you, then:
-      => If you're not using Snappy compression, move on.
-      => If you are, but you're not extracting Sereal documents
-         from larger strings, consider upgrading or move on.
-      => If you're using Snappy compression and might want to extract
-         Sereal documents from larger strings, then please:
-         1) Upgrade Sereal::Decoder and Sereal::Encoder everywhere.
-         2) Then swap the "snappy" option of the encoder for the
-            "snappy_incr" option.
-  - Support for the 'sort_keys' option, which outputs hash keys in
-    consistent order (but see gotchas in documentation).
-
-0.22 - unreleased
-0.21 - unreleased
-
-0.20  Fri Nov 23 15:35 2012
-  - Configurable recursion limit for the Encoder.
-  - Fix hard-crash issue with weak-refs to certain data structures
-    (issue #11 on github). Thanks to Andrew Yates for helping us debug
-    the problem!
-    => Regression tests still pending.
-
-0.19 - unreleased
-
-0.18  Wed Nov 14 07:30 2012
-  * This release contains critical bug fixes *
-  - Fix output data corruption in encoder when serializing an incompatible
-    data structure with refcount > 1 with the "stringify_unknown" option.
-
-0.17  Mon Oct 29 12:00 2012
-  * This release contains critical bug fixes *
-  - Fix pointer-stashing-broken-by-realloc-from-under-it problem by
-    using offsets instead.
-    This bug could cause you Perl to segfault.
-
-0.16  Thu Oct 25 12:00 2012
-  - Re-entrancy fix for obscure cases like calling into Sereal from
-    $SIG{__DIE__} if the exception was thrown from within Sereal.
-    (A bit of a "don't do that" case)
-
-0.15  Wed Oct 17 13:00 2012
-  - Thread-safety fix on Perls >= 5.8.7. Sereal is still not thread-safe
-    on older Perls 
-
-0.14  Wed Oct 10 11:11 2012
-  - The 'warn_unknown' option now optionally does NOT emit a warning
-    if the unsupported item is a blessed object with string overloading.
-
-0.13 - unreleased
-
-0.12  Wed Sep 19 08:00 2012
-  * Important bug fix *
-  - Under certain circumstances, an encoder object could be left
-    in an unclean state when an encoding operation failed via
-    an exception.
-
-0.11  Tue Sep 18 13:00 2012
-  - 5.8.5 fixes.
-  - Fixes to other languages' reference data output.
-
-0.10  Mon Sep 17 14:00 2012
-  - Perl 5.10 regular-expression-related build fixes.
-
-0.09  Fri Sep 14 10:00 2012
-  - Export functions by default when loaded from one liner
-  - More liberal set of decoder versions that we can run full tests against
-
-0.08  Thu Sep 13 17:00 2012
-  - 'snappy_threshold' option which controls at which minimum packet size
-    we start compressing with Snappy at all (if Snappy enabled)
-  - More tests.
-
-0.07  Tue Sep 11 14:00 2012
-  - "undef_unknown" option will cause unsupported Perl types to be
-    encoded as "undef" instead of throwing an exception.
-  - Similarly, "stringify_unknown" will make those unsupported types
-    be stringified instead. The two options are mutually exclusive.
-  - "warn_unknown" option (only meaningful if "stringify_unknown" or
-    "undef_unknown" are active) will cause a warning to be issued when
-    an unsupported type is encoded as a string or as undef.
-  - Bug fixes for encoding the contents of tied hashes (the tiedness
-    itself is not preserved by design).
-  - Solaris build fix.
-  - Test fixes for threaded perls (likely working around a bug in Perl
-  - Improved documentation.
-
-0.06  Mon Sep 10 11:00 2012
-  - First public release (CPAN).
-  - Beta quality software.
-
-0.05  Fri Sep  7 14:00 2012
-  - internal release.
-
-0.04  Thu Sep  6 16:00 2012
-  - internal release.
-
-0.03  Tue Sep  4 17:09 2012
-  - internal release.
-
-0.02  Tue Aug  8 17:09 2012
-  - internal release.
+Full change history available at https://github.com/Sereal/Sereal
  
-0.01  Tue Aug  8 17:09 2012
-  - original version; internal release.
diff --git a/Encoder.xs b/Encoder.xs

index 6f7d5751f49654cb5ff1e8ad92dff93a74a43801..bb2a974cab2e163fabf2f6d27383d135d422eaad 100644 (file)
--- a/Encoder.xs
+++ b/Encoder.xs
@@ -112,7 +112,8 @@ THX_ck_entersub_args_sereal_encode_with_object(pTHX_ OP *entersubop, GV *namegv,
    pushop->op_sibling = cvop;
    lastargop->op_sibling = NULL;
    op_free(entersubop);
-  newop = newUNOP(OP_CUSTOM, 0, firstargop);
+  newop = newUNOP(OP_NULL, 0, firstargop);
+  newop->op_type    = OP_CUSTOM;
    newop->op_private = arity == 3;
    newop->op_ppaddr = THX_pp_sereal_encode_with_object;
  
diff --git a/META.json b/META.json

index 1deadaed07949afa74a65fc9d72b03ea8fadfe5d..aef720dd9d6b62e7bf21c3317cf3198beb6674e4 100644 (file)
--- a/META.json
+++ b/META.json
@@ -4,7 +4,7 @@
        "Steffen Mueller <smueller@cpan.org>, Yves Orton <yves@cpan.org>"
     ],
     "dynamic_config" : 1,
-   "generated_by" : "ExtUtils::MakeMaker version 6.6302, CPAN::Meta::Converter version 2.120630",
+   "generated_by" : "ExtUtils::MakeMaker version 6.9, CPAN::Meta::Converter version 2.141520",
     "license" : [
        "perl_5"
     ],
@@ -46,7 +46,7 @@
           }
        }
     },
-   "release_status" : "testing",
+   "release_status" : "stable",
     "resources" : {
        "bugtracker" : {
           "web" : "https://github.com/Sereal/Sereal/issues"
@@ -55,5 +55,5 @@
           "url" : "git://github.com/Sereal/Sereal.git"
        }
     },
-   "version" : "3.001_012"
+   "version" : "3.002"
  }
diff --git a/META.yml b/META.yml

index 2f0d7aa2363a2dd80be26b1efb1ebd8d58074482..4093ecf547aa20dfef10b971d8fcaac74ecb8f7c 100644 (file)
--- a/META.yml
+++ b/META.yml
@@ -3,33 +3,33 @@ abstract: 'Fast, compact, powerful binary serialization'
  author:
    - 'Steffen Mueller <smueller@cpan.org>, Yves Orton <yves@cpan.org>'
  build_requires:
-  Data::Dumper: 0
-  ExtUtils::ParseXS: 2.21
-  File::Find: 0
-  File::Path: 0
-  File::Spec: 0
-  Scalar::Util: 0
-  Sereal::Decoder: 3.00
-  Test::LongString: 0
-  Test::More: 0.88
-  Test::Warn: 0
+  Data::Dumper: '0'
+  ExtUtils::ParseXS: '2.21'
+  File::Find: '0'
+  File::Path: '0'
+  File::Spec: '0'
+  Scalar::Util: '0'
+  Sereal::Decoder: '3.00'
+  Test::LongString: '0'
+  Test::More: '0.88'
+  Test::Warn: '0'
  configure_requires:
-  ExtUtils::MakeMaker: 0
+  ExtUtils::MakeMaker: '0'
  dynamic_config: 1
-generated_by: 'ExtUtils::MakeMaker version 6.6302, CPAN::Meta::Converter version 2.120630'
+generated_by: 'ExtUtils::MakeMaker version 6.9, CPAN::Meta::Converter version 2.141520'
  license: perl
  meta-spec:
    url: http://module-build.sourceforge.net/META-spec-v1.4.html
-  version: 1.4
+  version: '1.4'
  name: Sereal-Encoder
  no_index:
    directory:
      - t
      - inc
  requires:
-  XSLoader: 0
-  perl: 5.008
+  XSLoader: '0'
+  perl: '5.008'
  resources:
    bugtracker: https://github.com/Sereal/Sereal/issues
    repository: git://github.com/Sereal/Sereal.git
-version: 3.001_012
+version: '3.002'
diff --git a/author_tools/hobodecoder.pl b/author_tools/hobodecoder.pl

index 09878a0fa2f86a64c54ebabe660f536cef851ce9..f0e6f0621543ee24b4c21162c7209ef6bae14504 100644 (file)
--- a/author_tools/hobodecoder.pl
+++ b/author_tools/hobodecoder.pl
@@ -96,7 +96,6 @@ sub parse_long_double {
      die "Long double not supported" unless $len_D;
      my $v= substr($data, 0, $len_D, "");
      $done .= $v;
-    warn "long double size: " . $len_D;
      return unpack("D",$v);
  }
  
diff --git a/const-c.inc b/const-c.inc

index 91e964b45cc5b1ca2bbd66cb1f3f5ed49a5ca509..1c8e83099f736d0d1318571ba97a566087b5db42 100644 (file)
--- a/const-c.inc
+++ b/const-c.inc
@@ -763,7 +763,7 @@ constant (pTHX_ const char *name, STRLEN len, IV *iv_return) {
       Regenerate these constant functions by feeding this entire source file to
       perl -x
  
-#!/home/yorton/perl5/perlbrew/perls/perl-5.16.3-ld/bin/perl -w
+#!/usr/bin/perl -w
  use ExtUtils::Constant qw (constant_types C_constant XS_constant);
  
  my $types = {map {($_, 1)} qw(IV)};
diff --git a/lib/Sereal/Encoder.pm b/lib/Sereal/Encoder.pm

index 7625ac49dd6ac1d4850b07c607f9246f02a69f35..abb0285e9683e0f98d753eebdb055633e3f2166e 100644 (file)
--- a/lib/Sereal/Encoder.pm
+++ b/lib/Sereal/Encoder.pm
@@ -5,7 +5,7 @@ use warnings;
  use Carp qw/croak/;
  use XSLoader;
  
-our $VERSION = '3.001_012'; # Don't forget to update the TestCompat set for testing against installed decoders!
+our $VERSION = '3.002'; # Don't forget to update the TestCompat set for testing against installed decoders!
  our $XS_VERSION = $VERSION; $VERSION= eval $VERSION;
  
  # not for public consumption, just for testing.
@@ -50,13 +50,13 @@ Sereal::Encoder - Fast, compact, powerful binary serialization
  =head1 SYNOPSIS
  
    use Sereal::Encoder qw(encode_sereal sereal_encode_with_object);
-  
+
    my $encoder = Sereal::Encoder->new({...options...});
    my $out = $encoder->encode($structure);
-  
+
    # alternatively the functional interface:
    $out = sereal_encode_with_object($encoder, $structure);
-  
+
    # much slower functional interface with no persistent objects:
    $out = encode_sereal($structure, {... options ...});
  
@@ -278,8 +278,8 @@ gain if you plan to serialize multiple similar data structures, but destroy
  it if you serialize a single very large data structure just once to free
  the memory.
  
-See L</NON-CANONICAL> for why you might want to use this, and for the
-various caveats involved.
+See L</CANONICAL REPRESENTATION> for why you might want to use this, and
+for the various caveats involved.
  
  =head3 no_shared_hashkeys
  
@@ -439,12 +439,12 @@ Here is a contrived example of a class implementing the C<FREEZE> / C<THAW> mech
  
    package
      File;
-  
+
    use Moo;
-  
+
    has 'path' => (is => 'ro');
    has 'fh' => (is => 'rw');
-  
+
    # open file handle if necessary and return it
    sub get_fh {
      my $self = shift;
@@ -456,7 +456,7 @@ Here is a contrived example of a class implementing the C<FREEZE> / C<THAW> mech
      }
      return $fh;
    }
-  
+
    sub FREEZE {
      my ($self, $serializer) = @_;
      # Could switch on $serializer here: JSON, CBOR, Sereal, ...
@@ -465,7 +465,7 @@ Here is a contrived example of a class implementing the C<FREEZE> / C<THAW> mech
      # to recreate.
      return $self->path;
    }
-  
+
    sub THAW {
      my ($class, $serializer, $data) = @_;
      # Turn back into object.
@@ -491,25 +491,57 @@ C<Sereal::Encoder> objects will become a reference to undef in the new
  thread. This might change in a future release to become a full clone
  of the encoder object.
  
-=head1 NON-CANONICAL 
+=head1 CANONICAL REPRESENTATION
  
  You might want to compare two data structures by comparing their serialized
  byte strings.  For that to work reliably the serialization must take extra
  steps to ensure that identical data structures are encoded into identical
  serialized byte strings (a so-called "canonical representation").
  
-Currently the Sereal encoder I<does not> provide a mode that will reliably
-generate a canonical representation of a data structure. The reasons are many
-and sometimes subtle.
-
-Sereal does support some use-cases however. In this section we attempt to outline
-the issues well enough for you to decide if it is suitable for your needs.
+Unfortunately in Perl there is no such thing as a "canonical representation".
+Most people are interested in "structural equivalence" but even that is less
+well defined than most people think. For instance in the following example:
+
+    my $array1= [ 0, 0 ];
+    my $array2= do {
+        my $zero= 0;
+        sub{ \@_ }->($zero,$zero);
+    };
+
+the question of whether C<$array1> is structurally equivalent to C<$array2>
+is a subjective one. Sereal for instance would B<NOT> consider them
+equivalent but C<Test::Deep> would.  There are many examples of this in
+Perl. Simply stringifying a number technically changes the scalar. Storable
+would notice this, but Sereal generally would not.
+
+Despite this as of 3.002 the Sereal encoder supports a "canonical" option
+which will make a "best effort" attempt at producing a canonical
+representation of a data structure.  This mode is actually a combination of
+several other modes which may also be enabled independently, and as and when
+we add new options to the encoder that would assist in this regard then
+the C<canonical> will also enable them. These options may come with a
+performance penalty so care should be taken to read the Changes file and
+test the peformance implications when upgrading a system that uses this
+option.
+
+It is important to note that using canonical representation to determine
+if two data structures are different is subject to false-positives. If
+two Sereal encodings are identical you can generally assume that the
+two data structures are functionally equivalent from the point of view of
+normal Perl code (XS code might disagree). However if two Sereal
+encodings differ the data structures may actually be functionally
+equivalent.  In practice it seems the the false-positive rate is low,
+but your milage may vary.
+
+Some of the issues with producing a true canonical representation are
+outlined below:
  
  =over 4
  
  =item Sereal doesn't order the hash keys by default.
  
-This can be enabled via C<sort_keys>, see above.
+This can be enabled via the C<sort_keys>, which is itself enabled by
+C<canonical> option.
  
  =item Sereal output is sensitive to refcounts
  
@@ -517,7 +549,15 @@ This can be somewhat mitigated by the use of C<canonical_refs>, see above.
  
  =item There are multiple valid Sereal documents that you can produce for the same Perl data structure.
  
-Just L<sorting hash keys|/sort_keys> is not enough. A trivial example is PAD bytes which
+Just L<sorting hash keys|/sort_keys> is not enough.  Some of the reasons
+are outlined below. These issues are especially relevant when considering
+language interoperability.
+
+=over 4
+
+=item PAD bytes
+
+A trivial example is PAD bytes which
  mean nothing and are skipped. They mostly exist for encoder optimizations to
  prevent certain nasty backtracking situations from becoming O(n) at the cost of
  one byte of output. An explicit canonical mode would have to outlaw them (or
@@ -526,6 +566,8 @@ refcount/weakref handing in the encoder while at the same time causing some
  operations to go from O(1) to a full memcpy of everything after the point of
  where we backtracked to. Nasty.
  
+=item COPY tag
+
  Another example is COPY. The COPY tag indicates that the next element is an
  identical copy of a previous element (which is itself forbidden from including
  COPY's other than for class names). COPY is purely internal. The Perl/XS
@@ -533,6 +575,8 @@ implementation uses it to share hash keys and class names. One could use it for
  other strings (theoretically), but doesn't for time-efficiency reasons. We'd
  have to outlaw the use of this (significant) optimization of canonicalization.
  
+=item REF representation
+
  Sereal represents a reference to an array as a sequence of
  tags which, in its simplest form, reads I<REF, ARRAY $array_length TAG1 TAG2 ...>.
  The separation of "REF" and "ARRAY" is necessary to properly implement all of
@@ -543,7 +587,10 @@ into a special one byte ARRAYREF tag. This is a very significant optimization
  for common cases. This, however, does mean that most arrays up to 15 elements
  could be represented in two different, yet perfectly valid forms. ARRAYREF would
  have to be outlawed for a properly canonical form. The exact same logic
-applies to HASH vs. HASHREF.
+applies to HASH vs. HASHREF. This behavior can be overriden by the
+C<canonical_refs> option, which disables use of HASHREF and ARRAYREF.
+
+=item Numeric representation
  
  Similar to how Sereal can represent arrays and hashes in a full and a compact
  form. For small integers (between -16 and +15 inclusive), Sereal emits only
@@ -571,7 +618,7 @@ strings due to insignificant 'noise' in the floating point representation. Serea
  supports different floating point precisions and will generally choose the most
  compact that can represent your floating point number correctly.
  
-These issues are especially relevant when considering language interoperability.
+=back
  
  =back
  
@@ -579,7 +626,7 @@ Often, people don't actually care about "canonical" in the strict sense
  required for real I<identity> checking. They just require a best-effort sort of
  thing for caching. But it's a slippery slope!
  
-In a nutshell, the C<sort_keys> option may be sufficient for an application
+In a nutshell, the C<canonical> option may be sufficient for an application
  which is simply serializing a cache key, and thus there's little harm in an
  occasional false-negative, but think carefully before applying Sereal in other
  use-cases.
author	gregor herrmann <gregoa@debian.org>
	Wed, 27 Aug 2014 02:35:27 +0000 (19:35 -0700)
committer	gregor herrmann <gregoa@debian.org>
	Wed, 27 Aug 2014 02:35:27 +0000 (19:35 -0700)
Changes		patch \| blob \| history
Encoder.xs		patch \| blob \| history
META.json		patch \| blob \| history
META.yml		patch \| blob \| history
author_tools/hobodecoder.pl		patch \| blob \| history
const-c.inc		patch \| blob \| history
lib/Sereal/Encoder.pm		patch \| blob \| history