* Warning: For a seamless upgrade, upgrade to version 3
* of the decoder before upgrading to version 3 of the
* encoder!
-3.001_011 Tues, Aug 12 2014
- - Remove use of defined-or in t/lib/TestSet.pm
-3.001_010 Tues, Aug 12 2014
- - Cleanup and enhance the "alternates" testing in t/010_desperate.t
+3.002 Aug 20 2014
+ Summary of changes from 3.001 - 3.002
+ - Introduce "canonical" option to encoder
+ - Introduce "canonical_refs" option to encoder
-3.001_007 .. 3.001_009
- - Try to fix t/010_desperate.t on threaded perls (yes that many releases. sigh)
+ * Test Infra Changes
+ - Split up bulk tests to speed up testing and make it easier
+ to see when a failure is restricted to a specific option.
-3.001_006 Sun, Aug 03 2014
- - Rework bulk tests so we test more, but report less tests.
- The test infrastructure doesn't play well with lots of tests
- in a file. Similarly, if we fail one of the methods in the bulk
- tests we stop testing the rest.
- - Add a canonical mode to the encoder.
- - More tests.
-
-3.001_005 Mon, July 28 2014
- - Fixup how MakeMaker runs the tests.
-
-3.001_004 Sun, July 27 2014
- - Rework bulk tests so that tests are grouped by options and version
- - Fixups for non-x86 architectures.
-
-3.001_003
-3.001_002
- - Attempts to fix builds on sparc, s390x and ARM.
-
-3.001_001
- - Patches from Jarkko Hietaniemi to make Sereal pass test
- on HP-UX, and other machines with endian or alignedness
- issues. Thanks to H.Merijn Brand for assisting and providing
- access to test machines.
+ * Big-Endian Support
+ - Improved support for Big-Endian machines. We now build and pass test
+ on Sparc and HP-UX and other platforms with big-endian or strict
+ alignedness requirements. Much thanks to Jarkko Hietaniemi,
+ Gregor Herrmann, and H. Merijn Brand for for their assistance with
+ this.
+ - We still have issues with s390x (Z/Os) with Sereal. If someone wants
+ to help it would be appreciated.
3.001
- - Production release 1 of protocol version 3
- - Zlib support
- - CANONICAL_UNDEF,
- - new magic header to make it easier to detect
- UTF8 encoded data.
- - Minor changes to how scalar values are serialized
- to favour more compact representations.
-
-3.000_004
- - Fix issues in new serialization rules with tied arrays
- on older Perls revealed by breakage in t/400_evil.t
-
-3.000_003
- - Sync release with decoder.
-
-3.000_002
- - Minor protocol changes to magic header definition.
-
-3.000_001
- Upgrade to version 3 of the protocol
* Add Zlib compression support to the protocol
* Add Zlib support to Encoder/Decoder
(this is to handle engineering notation like "0E0"
where numeric and string equivalence may differ)
-2.12 Sun May 11 23:30
- - Synchronization release with Decoder changes.
-
-2.11 Sun Apr 13 23:04
- - Work around regression in Perl 5.16.3 - 5.17.0
- As of 8ae39f603f0f5778c160e18e08df60 while each
- automagically becomes while $_= defined(each);
- which manages to break some of our test code.
-
-2.10 Sun Apr 13 21:30
- - Fix broken MANIFEST
-
-2.09 Sun Apr 13 21:15
- - Synchro release with Decoder change.
-
-2.08 Thu Apr 10 22:10 2013
- - Production release for previous changes.
-
-2.070_103 Wed Apr 09 00:33 2013 * DEV RELEASE *
- - Synchronization release with Decoder. No changes.
-
-2.070_102 Sun Apr 06 17:27 2013 * DEV RELEASE *
- - Fixes for how we load XS so Sereal.pm works properly
- with dev releases.
-
-2.070_101 Sun Apr 06 17:27 2013 * DEV RELEASE *
- - Fix for newer perls.
- - Changes to 'fixver.pl' and version numbering so we do
- a 3 digit minor version, and a 3 digit dev version,
- so once this dev release cycle is done we will be at
- v2.071 everywhere. This eliminates a version numbering
- inconsistency in Sereal.pm from Encoder.pm and Decoder.pm
-
-2.07_01 Wed Mar 26 18:10 2014 * DEV RELEASE *
- - Fix for aliased_dedupe_strings feature (Borislav Nikolov)
- - Add sereal_decode_with_object(), a functional/custom-opcode
- implementation of the OO interface, with much less overhead.
- In practice this will make a very modest impact on dumping,
- but if your applications needs it...
- Thanks to Zefram for the custom op implementation.
- - Optimize dumping hashes by being more careful how we
- check if they have backreferences, and avoid creating
- a HvAUX() structure (and thus reallocing the hashes bucket
- array) just to find out if they have backreferences.
- Reported by Steffen.
-
-2.06 Sun Mar 0 11:40 2014 (AMS time)
- - Only minor changes.
-
-2.05 Fri Mar 7 10:30 2014 (AMS time)
- - Fix rt.cpan.org #93560 - Encoder object wasn't re-entrant from
- FREEZE calls.
-
-2.04 Wed Mar 5 18:15 2014 (AMS time)
- - Fix rt.cpan.org #93484 - fencepost error in Encoder.xs (Zefram)
-
-2.03 Tue Jan 7 20:00 2014 (AMS time)
- - (Hopefully) final fixes to FREEZE/THAW functionality:
- => Add safe assertion to make sure that we don't segfault on invalid
- data.
- => Fix encoding/decoding of data structures with repeated references
- to the same instance of a class that has FREEZE/THAW hooks.
- Thanks to Christian Hansen for a test case.
- - Distribution dependency fix.
-
-2.02 Mon Jan 6 15:00 2014 (AMS time)
- - Fundamental fixes for FREEZE/THAW support in previous Sereal v2
- releases. If you plan to use FREEZE/THAW, make sure you have 2.02
- or better (dito for the decoder).
-
-2.01 Tue Dec 31 08:15 2013 (AMS time)
- - Promoting changes from 0.37 to 2.00_03 to a stable release.
- (This being the first protocol v2 stable release.)
- - Minor performance tweaks.
-
-2.00_03 Sun Dec 29 10:33 2013 (AMS time)
- - FREEZE/THAW hooks for object serialization.
- - Test improvements (allowing for partial parallel run)
- - Minor optimizations.
-
-2.00_02 Mon Oct 28 19:32 2013 (AMS time)
- - Sereal::Encoder now requires Sereal::Decoder for better testing.
- - Fix Test::Warn dependency problem of 2.00_01.
-
-2.00_01 tue Oct 1 07:34 2013 (AMS time)
- - NEW PROTOCOL VERSION: V2
- - User-data in header functionality: You may embed arbitrary
- Sereal-serializable data in a document header. The document
- header isn't compressed, so this is ideal for retrieving
- small chunks of meta-data (eg. routing information) without
- having to deserialize the entire document.
- - Relocatable Sereal document bodies
- - Encoder never emits non-incremental Snappy encoding for V2
- - Offsets now 1-based in relocatable format, not 0
- - Fixed VERY obscure (and rare) memory leak.
- - Improved error messages
- - Remove warning about Sereal not being production-grade
- (because it IS).
- - Detect when the Snappy compression was net negative in size
- and back out
- - C89/Windows fixes (bulk88)
- - 5.18 compat: Skip test failing due to hash-randomization (Zefram)
-
-0.37 Mon Sep 2 07:40 2013 (AMS time)
- - Windows and C89 fixes
- - Band-aid: Skip test failing due to hash-randomization [Zefram]
-
-0.36 Tue May 7 12:00 2013 (AMS time)
- [changelog for encoder and decoder both]
- - Add "incremental" option to decoder for easier decoding of
- multiple sereal documents in one buffer.
- - Make snappy and snappy_incr options mutually exclusive.
- - Feature: Implement aliasing for deduping (aliased_dedupe_strings)
-
-0.35 Mon Apr 1 11:50 2013 (AMS time)
- - Add new no_bless_objects option from Simon Bertrang.
-
-0.34 Sat Mar 23 18:59:18 2013 (AMS time)
- - Fixup Manifest
-
-0.33 Sun Feb 17 17:26 2013 (AMS time)
- - Fix problem with hv_backrefs (Issue #27)
-
-0.32 Sun Feb 17 15:06 2013 (AMS time)
- - Add "dedupe_strings" option, which will make
- the encoder do extra work to dedupe string values
- in the serialized output.
-
-0.31 Sun Feb 17 15:06 2013 (AMS time)
- - Daniel Dragan <bulk88@hotmail.com> spent a bunch of time
- digging into the weird problems we were having with Snappy
- encoded data on Windows on certain builds. Turned out that
- it was right broken, and worked sometimes purely by chance.
- He kindly provided a patch.
-
-0.30 Wed Feb 13 06:21 2013 (AMS time)
- - Found a work around for VC6 Windows 32 bit builds
- Compile was "optimizing" float comparisons to use 80 bit precision
- regardless of type, this release uses a workaround of marking the
- relevant vars "volatile".
-
-0.29 Sat Feb 09 18:09 2013 (AMS time)
- - Dummy release to keep Encoder in sync with Decoder.
-
-0.28 Sat Feb 09 16:20 2013 (AMS time)
- - More fixups for building on Win32/C89 compilers
- - Eliminate unnecessary use of strlen.
-
-0.27 Sat Feb 09 12:58 2013 (AMS time)
- - Various fixups to improve building on Win32
- - Fix C89 violations
-
-0.26 Sun Feb 03 13:45 2013
- - Compatibility with perl 5.17.6 (5.18-to-be) regarding regular
- expression encoding.
- - Fixed Changelog order (why would I ever have listed oldest first?)
-
-0.25 Tue Jan 22 18:00 2013
- - Various compatibility fixes with old versions of Perl.
- Specifically, fixes to regular expression handling that should help
- with 5.10 support, as well as fixes that should improve the status
- quo on 5.8.
- - Potential fix for a leak wrt. regular expression support.
- - Fewer compiler warnings on 32bit/gcc.
-
-0.24 - unreleased
-
-0.23 Tue Jan 08 07:23 2013
- * Important bug fix release *
- * Warning *
- Before using the incremental Snappy mode described below, you must
- upgrade the Sereal::Decoder to version 0.23 or higher!
- - Support for the 'snappy_incr' option, which uses a new Snappy
- compression format that is suitable for parsing multiple Sereal
- documents from a large buffer. A bug in the previous implementation
- of Snappy-compression resulted in the Decoder failing if the
- buffer (Perl input string) extended beyond the length of the
- Snappy-compressed Sereal document.
- If this confuses you, then:
- => If you're not using Snappy compression, move on.
- => If you are, but you're not extracting Sereal documents
- from larger strings, consider upgrading or move on.
- => If you're using Snappy compression and might want to extract
- Sereal documents from larger strings, then please:
- 1) Upgrade Sereal::Decoder and Sereal::Encoder everywhere.
- 2) Then swap the "snappy" option of the encoder for the
- "snappy_incr" option.
- - Support for the 'sort_keys' option, which outputs hash keys in
- consistent order (but see gotchas in documentation).
-
-0.22 - unreleased
-0.21 - unreleased
-
-0.20 Fri Nov 23 15:35 2012
- - Configurable recursion limit for the Encoder.
- - Fix hard-crash issue with weak-refs to certain data structures
- (issue #11 on github). Thanks to Andrew Yates for helping us debug
- the problem!
- => Regression tests still pending.
-
-0.19 - unreleased
-
-0.18 Wed Nov 14 07:30 2012
- * This release contains critical bug fixes *
- - Fix output data corruption in encoder when serializing an incompatible
- data structure with refcount > 1 with the "stringify_unknown" option.
-
-0.17 Mon Oct 29 12:00 2012
- * This release contains critical bug fixes *
- - Fix pointer-stashing-broken-by-realloc-from-under-it problem by
- using offsets instead.
- This bug could cause you Perl to segfault.
-
-0.16 Thu Oct 25 12:00 2012
- - Re-entrancy fix for obscure cases like calling into Sereal from
- $SIG{__DIE__} if the exception was thrown from within Sereal.
- (A bit of a "don't do that" case)
-
-0.15 Wed Oct 17 13:00 2012
- - Thread-safety fix on Perls >= 5.8.7. Sereal is still not thread-safe
- on older Perls
-
-0.14 Wed Oct 10 11:11 2012
- - The 'warn_unknown' option now optionally does NOT emit a warning
- if the unsupported item is a blessed object with string overloading.
-
-0.13 - unreleased
-
-0.12 Wed Sep 19 08:00 2012
- * Important bug fix *
- - Under certain circumstances, an encoder object could be left
- in an unclean state when an encoding operation failed via
- an exception.
-
-0.11 Tue Sep 18 13:00 2012
- - 5.8.5 fixes.
- - Fixes to other languages' reference data output.
-
-0.10 Mon Sep 17 14:00 2012
- - Perl 5.10 regular-expression-related build fixes.
-
-0.09 Fri Sep 14 10:00 2012
- - Export functions by default when loaded from one liner
- - More liberal set of decoder versions that we can run full tests against
-
-0.08 Thu Sep 13 17:00 2012
- - 'snappy_threshold' option which controls at which minimum packet size
- we start compressing with Snappy at all (if Snappy enabled)
- - More tests.
-
-0.07 Tue Sep 11 14:00 2012
- - "undef_unknown" option will cause unsupported Perl types to be
- encoded as "undef" instead of throwing an exception.
- - Similarly, "stringify_unknown" will make those unsupported types
- be stringified instead. The two options are mutually exclusive.
- - "warn_unknown" option (only meaningful if "stringify_unknown" or
- "undef_unknown" are active) will cause a warning to be issued when
- an unsupported type is encoded as a string or as undef.
- - Bug fixes for encoding the contents of tied hashes (the tiedness
- itself is not preserved by design).
- - Solaris build fix.
- - Test fixes for threaded perls (likely working around a bug in Perl
- - Improved documentation.
-
-0.06 Mon Sep 10 11:00 2012
- - First public release (CPAN).
- - Beta quality software.
-
-0.05 Fri Sep 7 14:00 2012
- - internal release.
-
-0.04 Thu Sep 6 16:00 2012
- - internal release.
-
-0.03 Tue Sep 4 17:09 2012
- - internal release.
-
-0.02 Tue Aug 8 17:09 2012
- - internal release.
+Full change history available at https://github.com/Sereal/Sereal
-0.01 Tue Aug 8 17:09 2012
- - original version; internal release.
use Carp qw/croak/;
use XSLoader;
-our $VERSION = '3.001_012'; # Don't forget to update the TestCompat set for testing against installed decoders!
+our $VERSION = '3.002'; # Don't forget to update the TestCompat set for testing against installed decoders!
our $XS_VERSION = $VERSION; $VERSION= eval $VERSION;
# not for public consumption, just for testing.
=head1 SYNOPSIS
use Sereal::Encoder qw(encode_sereal sereal_encode_with_object);
-
+
my $encoder = Sereal::Encoder->new({...options...});
my $out = $encoder->encode($structure);
-
+
# alternatively the functional interface:
$out = sereal_encode_with_object($encoder, $structure);
-
+
# much slower functional interface with no persistent objects:
$out = encode_sereal($structure, {... options ...});
it if you serialize a single very large data structure just once to free
the memory.
-See L</NON-CANONICAL> for why you might want to use this, and for the
-various caveats involved.
+See L</CANONICAL REPRESENTATION> for why you might want to use this, and
+for the various caveats involved.
=head3 no_shared_hashkeys
package
File;
-
+
use Moo;
-
+
has 'path' => (is => 'ro');
has 'fh' => (is => 'rw');
-
+
# open file handle if necessary and return it
sub get_fh {
my $self = shift;
}
return $fh;
}
-
+
sub FREEZE {
my ($self, $serializer) = @_;
# Could switch on $serializer here: JSON, CBOR, Sereal, ...
# to recreate.
return $self->path;
}
-
+
sub THAW {
my ($class, $serializer, $data) = @_;
# Turn back into object.
thread. This might change in a future release to become a full clone
of the encoder object.
-=head1 NON-CANONICAL
+=head1 CANONICAL REPRESENTATION
You might want to compare two data structures by comparing their serialized
byte strings. For that to work reliably the serialization must take extra
steps to ensure that identical data structures are encoded into identical
serialized byte strings (a so-called "canonical representation").
-Currently the Sereal encoder I<does not> provide a mode that will reliably
-generate a canonical representation of a data structure. The reasons are many
-and sometimes subtle.
-
-Sereal does support some use-cases however. In this section we attempt to outline
-the issues well enough for you to decide if it is suitable for your needs.
+Unfortunately in Perl there is no such thing as a "canonical representation".
+Most people are interested in "structural equivalence" but even that is less
+well defined than most people think. For instance in the following example:
+
+ my $array1= [ 0, 0 ];
+ my $array2= do {
+ my $zero= 0;
+ sub{ \@_ }->($zero,$zero);
+ };
+
+the question of whether C<$array1> is structurally equivalent to C<$array2>
+is a subjective one. Sereal for instance would B<NOT> consider them
+equivalent but C<Test::Deep> would. There are many examples of this in
+Perl. Simply stringifying a number technically changes the scalar. Storable
+would notice this, but Sereal generally would not.
+
+Despite this as of 3.002 the Sereal encoder supports a "canonical" option
+which will make a "best effort" attempt at producing a canonical
+representation of a data structure. This mode is actually a combination of
+several other modes which may also be enabled independently, and as and when
+we add new options to the encoder that would assist in this regard then
+the C<canonical> will also enable them. These options may come with a
+performance penalty so care should be taken to read the Changes file and
+test the peformance implications when upgrading a system that uses this
+option.
+
+It is important to note that using canonical representation to determine
+if two data structures are different is subject to false-positives. If
+two Sereal encodings are identical you can generally assume that the
+two data structures are functionally equivalent from the point of view of
+normal Perl code (XS code might disagree). However if two Sereal
+encodings differ the data structures may actually be functionally
+equivalent. In practice it seems the the false-positive rate is low,
+but your milage may vary.
+
+Some of the issues with producing a true canonical representation are
+outlined below:
=over 4
=item Sereal doesn't order the hash keys by default.
-This can be enabled via C<sort_keys>, see above.
+This can be enabled via the C<sort_keys>, which is itself enabled by
+C<canonical> option.
=item Sereal output is sensitive to refcounts
=item There are multiple valid Sereal documents that you can produce for the same Perl data structure.
-Just L<sorting hash keys|/sort_keys> is not enough. A trivial example is PAD bytes which
+Just L<sorting hash keys|/sort_keys> is not enough. Some of the reasons
+are outlined below. These issues are especially relevant when considering
+language interoperability.
+
+=over 4
+
+=item PAD bytes
+
+A trivial example is PAD bytes which
mean nothing and are skipped. They mostly exist for encoder optimizations to
prevent certain nasty backtracking situations from becoming O(n) at the cost of
one byte of output. An explicit canonical mode would have to outlaw them (or
operations to go from O(1) to a full memcpy of everything after the point of
where we backtracked to. Nasty.
+=item COPY tag
+
Another example is COPY. The COPY tag indicates that the next element is an
identical copy of a previous element (which is itself forbidden from including
COPY's other than for class names). COPY is purely internal. The Perl/XS
other strings (theoretically), but doesn't for time-efficiency reasons. We'd
have to outlaw the use of this (significant) optimization of canonicalization.
+=item REF representation
+
Sereal represents a reference to an array as a sequence of
tags which, in its simplest form, reads I<REF, ARRAY $array_length TAG1 TAG2 ...>.
The separation of "REF" and "ARRAY" is necessary to properly implement all of
for common cases. This, however, does mean that most arrays up to 15 elements
could be represented in two different, yet perfectly valid forms. ARRAYREF would
have to be outlawed for a properly canonical form. The exact same logic
-applies to HASH vs. HASHREF.
+applies to HASH vs. HASHREF. This behavior can be overriden by the
+C<canonical_refs> option, which disables use of HASHREF and ARRAYREF.
+
+=item Numeric representation
Similar to how Sereal can represent arrays and hashes in a full and a compact
form. For small integers (between -16 and +15 inclusive), Sereal emits only
supports different floating point precisions and will generally choose the most
compact that can represent your floating point number correctly.
-These issues are especially relevant when considering language interoperability.
+=back
=back
required for real I<identity> checking. They just require a best-effort sort of
thing for caching. But it's a slippery slope!
-In a nutshell, the C<sort_keys> option may be sufficient for an application
+In a nutshell, the C<canonical> option may be sufficient for an application
which is simply serializing a cache key, and thus there's little harm in an
occasional false-negative, but think carefully before applying Sereal in other
use-cases.