<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Tickle</title>
	<atom:link href="http://blog.w-nz.com/archives/2007/12/28/tickle/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.w-nz.com/archives/2007/12/28/tickle/</link>
	<description>A few thoughts</description>
	<pubDate>Thu, 21 Aug 2008 22:21:49 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: Alexandre Vassalotti</title>
		<link>http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159293</link>
		<dc:creator>Alexandre Vassalotti</dc:creator>
		<pubDate>Sun, 30 Dec 2007 06:01:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159293</guid>
		<description>&lt;blockquote&gt;Maybe writing a dialect of Tickle that is very compressible would be nice. I guess using one byte markers to mark beginning and ends of data would be very compressible. Eg: [here starts a tuple] [an int] 5 [an int] 5 … [here ends the tuple], etc. Basically repr but then with binary data and no extra spaces.&lt;/blockquote&gt;

The format you are describing is the one currently used by pickle. In protocol 1, a tuple of three 5 is serialized as:
&lt;code&gt;(I5
I5
I5
t.&lt;/code&gt;
Or in the newer binary protocols:
&lt;code&gt;(K\x05K\x05K\x05t.&lt;/code&gt;
(where K is the opcode for 1-byte integers)

If you are curious, I do have &lt;a href="http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/" rel="nofollow"&gt;an article about how pickle work.&lt;/a&gt; It doesn't cover the details of the newer protocols, but it should give you a good start to learn them afterward. Anyway, I am currently working on the next version of the pickle protocol for Python 3K. So if interested to help, just send me an email.</description>
		<content:encoded><![CDATA[<blockquote><p>Maybe writing a dialect of Tickle that is very compressible would be nice. I guess using one byte markers to mark beginning and ends of data would be very compressible. Eg: [here starts a tuple] [an int] 5 [an int] 5 … [here ends the tuple], etc. Basically repr but then with binary data and no extra spaces.</p></blockquote>
<p>The format you are describing is the one currently used by pickle. In protocol 1, a tuple of three 5 is serialized as:<br />
<code>(I5<br />
I5<br />
I5<br />
t.</code><br />
Or in the newer binary protocols:<br />
<code>(K\x05K\x05K\x05t.</code><br />
(where K is the opcode for 1-byte integers)</p>
<p>If you are curious, I do have <a href="http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/" rel="nofollow">an article about how pickle work.</a> It doesn&#8217;t cover the details of the newer protocols, but it should give you a good start to learn them afterward. Anyway, I am currently working on the next version of the pickle protocol for Python 3K. So if interested to help, just send me an email.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bas Westerbaan</title>
		<link>http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159263</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Sat, 29 Dec 2007 12:09:30 +0000</pubDate>
		<guid isPermaLink="false">http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159263</guid>
		<description>Oh, wait, when looking even further Tickle wins again.  I guess Tickle is better at compressing big strings and big ints than Pickle is:

&lt;code&gt;1024&#160;&#160;&#160;5514&#160;&#160;&#160;3351&#160;&#160;&#160;3367
2048&#160;&#160;&#160;10918&#160;&#160;6910&#160;&#160;&#160;6414
4096&#160;&#160;&#160;20763&#160;&#160;12792&#160;&#160;11833
8192&#160;&#160;&#160;39492&#160;&#160;23893&#160;&#160;22166
16384&#160;&#160;77385&#160;&#160;46063&#160;&#160;44242
32768&#160;&#160;152429&#160;89529&#160;&#160;86426&lt;/code&gt;

Or that pickle at least has got a very compressible tuple format.</description>
		<content:encoded><![CDATA[<p>Oh, wait, when looking even further Tickle wins again.  I guess Tickle is better at compressing big strings and big ints than Pickle is:</p>
<p><code>1024&nbsp;&nbsp;&nbsp;5514&nbsp;&nbsp;&nbsp;3351&nbsp;&nbsp;&nbsp;3367<br />
2048&nbsp;&nbsp;&nbsp;10918&nbsp;&nbsp;6910&nbsp;&nbsp;&nbsp;6414<br />
4096&nbsp;&nbsp;&nbsp;20763&nbsp;&nbsp;12792&nbsp;&nbsp;11833<br />
8192&nbsp;&nbsp;&nbsp;39492&nbsp;&nbsp;23893&nbsp;&nbsp;22166<br />
16384&nbsp;&nbsp;77385&nbsp;&nbsp;46063&nbsp;&nbsp;44242<br />
32768&nbsp;&nbsp;152429&nbsp;89529&nbsp;&nbsp;86426</code></p>
<p>Or that pickle at least has got a very compressible tuple format.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bas Westerbaan</title>
		<link>http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159262</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Sat, 29 Dec 2007 12:06:44 +0000</pubDate>
		<guid isPermaLink="false">http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159262</guid>
		<description>(&lt;code&gt;tickle&lt;/code&gt; accepts a file (named stream) parameter and &lt;code&gt;untickle&lt;/code&gt; both a stream as a str)

Now I wonder how it depends on the size:

&lt;code&gt;1&#160;&#160;&#160;&#160;37&#160;&#160;&#160;&#160;31&#160;&#160;&#160;&#160;27
2&#160;&#160;&#160;&#160;48&#160;&#160;&#160;&#160;38&#160;&#160;&#160;&#160;33
4&#160;&#160;&#160;&#160;68&#160;&#160;&#160;&#160;50&#160;&#160;&#160;&#160;43
8&#160;&#160;&#160;&#160;99&#160;&#160;&#160;&#160;71&#160;&#160;&#160;&#160;62
16&#160;&#160;&#160;153&#160;&#160;&#160;103&#160;&#160;&#160;91
32&#160;&#160;&#160;254&#160;&#160;&#160;150&#160;&#160;&#160;147
64&#160;&#160;&#160;467&#160;&#160;&#160;235&#160;&#160;&#160;253
128&#160;&#160;952&#160;&#160;&#160;431&#160;&#160;&#160;489
256&#160;&#160;2005&#160;&#160;834&#160;&#160;&#160;974
512&#160;&#160;3959&#160;&#160;1739&#160;&#160;1782&lt;/code&gt;

Again with the same test object but then with a different amount of elements.  First column is the amount of elements.  Second the size of pickle, third of pickle with fast attribute and the last of tickle.

Maybe writing a dialect of Tickle that is very compressible would be nice.  I guess using one byte markers to mark beginning and ends of data would be very compressible.  Eg: [here starts a tuple] [an int] 5 [an int] 5 ... [here ends the tuple], etc.  Basically repr but then with binary data and no extra spaces.</description>
		<content:encoded><![CDATA[<p>(<code>tickle</code> accepts a file (named stream) parameter and <code>untickle</code> both a stream as a str)</p>
<p>Now I wonder how it depends on the size:</p>
<p><code>1&nbsp;&nbsp;&nbsp;&nbsp;37&nbsp;&nbsp;&nbsp;&nbsp;31&nbsp;&nbsp;&nbsp;&nbsp;27<br />
2&nbsp;&nbsp;&nbsp;&nbsp;48&nbsp;&nbsp;&nbsp;&nbsp;38&nbsp;&nbsp;&nbsp;&nbsp;33<br />
4&nbsp;&nbsp;&nbsp;&nbsp;68&nbsp;&nbsp;&nbsp;&nbsp;50&nbsp;&nbsp;&nbsp;&nbsp;43<br />
8&nbsp;&nbsp;&nbsp;&nbsp;99&nbsp;&nbsp;&nbsp;&nbsp;71&nbsp;&nbsp;&nbsp;&nbsp;62<br />
16&nbsp;&nbsp;&nbsp;153&nbsp;&nbsp;&nbsp;103&nbsp;&nbsp;&nbsp;91<br />
32&nbsp;&nbsp;&nbsp;254&nbsp;&nbsp;&nbsp;150&nbsp;&nbsp;&nbsp;147<br />
64&nbsp;&nbsp;&nbsp;467&nbsp;&nbsp;&nbsp;235&nbsp;&nbsp;&nbsp;253<br />
128&nbsp;&nbsp;952&nbsp;&nbsp;&nbsp;431&nbsp;&nbsp;&nbsp;489<br />
256&nbsp;&nbsp;2005&nbsp;&nbsp;834&nbsp;&nbsp;&nbsp;974<br />
512&nbsp;&nbsp;3959&nbsp;&nbsp;1739&nbsp;&nbsp;1782</code></p>
<p>Again with the same test object but then with a different amount of elements.  First column is the amount of elements.  Second the size of pickle, third of pickle with fast attribute and the last of tickle.</p>
<p>Maybe writing a dialect of Tickle that is very compressible would be nice.  I guess using one byte markers to mark beginning and ends of data would be very compressible.  Eg: [here starts a tuple] [an int] 5 [an int] 5 &#8230; [here ends the tuple], etc.  Basically repr but then with binary data and no extra spaces.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alexandre Vassalotti</title>
		<link>http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159252</link>
		<dc:creator>Alexandre Vassalotti</dc:creator>
		<pubDate>Sat, 29 Dec 2007 07:25:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159252</guid>
		<description>I was curious to see if compressing was worthwhile. So, I did some
tests.

&lt;code&gt;import pickle, tickle
from StringIO import StringIO
from gzip import GzipFile

def make_pickle_dump(fast=False):
&#160;&#160;def dump(obj, file):
&#160;&#160;&#160;&#160;p = pickle.Pickler(file, 2)
&#160;&#160;&#160;&#160;p.fast = fast
&#160;&#160;&#160;&#160;p.dump(obj)
&#160;&#160;return dump

pickle_dump = make_pickle_dump()
pickle_dump_fast = make_pickle_dump(fast=True)
pickle_load = pickle.load

def tickle_dump(obj, file):
&#160;&#160;file.write(tickle.tickle(obj))

def tickle_load(file):
&#160;&#160;return tickle.untickle(file.read())

def dump(obj, dump_method):
&#160;&#160;s = StringIO()
&#160;&#160;gz = GzipFile(fileobj=s, mode="wb")
&#160;&#160;dump_method(obj, gz)
&#160;&#160;gz.close()
&#160;&#160;return s.getvalue()

def load(data, load_method):
&#160;&#160;s = StringIO(data)
&#160;&#160;gz = GzipFile(fileobj=s, mode="rb")
&#160;&#160;return load_method(gz)&lt;/code&gt;

Again, using your example 'obj':

&lt;code&gt;&#62;&#62;&#62; len(dump(obj, pickle_dump))
717
&#62;&#62;&#62; len(dump(obj, pickle_dump_fast))
342
&#62;&#62;&#62; len(dump(obj, tickle_dump))
386&lt;/code&gt;

I am quite surprised to see how well pickle streams, generated using
the 'fast' attribute, compress. However for tickle streams, the gain
is minimal, which, I presume, means that your protocol is compact.</description>
		<content:encoded><![CDATA[<p>I was curious to see if compressing was worthwhile. So, I did some<br />
tests.</p>
<p><code>import pickle, tickle<br />
from StringIO import StringIO<br />
from gzip import GzipFile</p>
<p>def make_pickle_dump(fast=False):<br />
&nbsp;&nbsp;def dump(obj, file):<br />
&nbsp;&nbsp;&nbsp;&nbsp;p = pickle.Pickler(file, 2)<br />
&nbsp;&nbsp;&nbsp;&nbsp;p.fast = fast<br />
&nbsp;&nbsp;&nbsp;&nbsp;p.dump(obj)<br />
&nbsp;&nbsp;return dump</p>
<p>pickle_dump = make_pickle_dump()<br />
pickle_dump_fast = make_pickle_dump(fast=True)<br />
pickle_load = pickle.load</p>
<p>def tickle_dump(obj, file):<br />
&nbsp;&nbsp;file.write(tickle.tickle(obj))</p>
<p>def tickle_load(file):<br />
&nbsp;&nbsp;return tickle.untickle(file.read())</p>
<p>def dump(obj, dump_method):<br />
&nbsp;&nbsp;s = StringIO()<br />
&nbsp;&nbsp;gz = GzipFile(fileobj=s, mode="wb")<br />
&nbsp;&nbsp;dump_method(obj, gz)<br />
&nbsp;&nbsp;gz.close()<br />
&nbsp;&nbsp;return s.getvalue()</p>
<p>def load(data, load_method):<br />
&nbsp;&nbsp;s = StringIO(data)<br />
&nbsp;&nbsp;gz = GzipFile(fileobj=s, mode="rb")<br />
&nbsp;&nbsp;return load_method(gz)</code></p>
<p>Again, using your example &#8216;obj&#8217;:</p>
<p><code>&gt;&gt;&gt; len(dump(obj, pickle_dump))<br />
717<br />
&gt;&gt;&gt; len(dump(obj, pickle_dump_fast))<br />
342<br />
&gt;&gt;&gt; len(dump(obj, tickle_dump))<br />
386</code></p>
<p>I am quite surprised to see how well pickle streams, generated using<br />
the &#8216;fast&#8217; attribute, compress. However for tickle streams, the gain<br />
is minimal, which, I presume, means that your protocol is compact.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bas Westerbaan</title>
		<link>http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159242</link>
		<dc:creator>Bas Westerbaan</dc:creator>
		<pubDate>Sat, 29 Dec 2007 02:31:18 +0000</pubDate>
		<guid isPermaLink="false">http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159242</guid>
		<description>Forgot to add HIGHEST_PROTOCOL.  Tickle itself would also enter an infinite loop on cyclic references so &lt;code&gt;.fast=True&lt;/code&gt; would be a fair for comparison  (Thanks for noticing it!). I wrote tickle for a rpc module which relies on small messages with a predefined structure for which Tickle with its templates is suited way better than pickle and is easier to maintain and handle than using &lt;code&gt;pack&lt;/code&gt;.

&lt;strong&gt;gzip&lt;/strong&gt; doesn't make a whole lot of a difference on small rpc packets, I guess.  Maybe delta-compression would though (which is just a fancy word in this case for persisting the state of the compressor between messages). I'll write two backends, one Tickle and one 'GzPickle' to test it.

To make writing templates more convenient I'll write a &lt;code&gt;pack&lt;/code&gt; lilke format string that is converted magically by (un)pickle to an appropriate tuple.

When writing in Python itself, it's suboptimal anyway.  Using classes really doesn't make that much of a difference compared to using a table of functions in python anyway.  I should write a c version of Tickle if it ends up to be useful.</description>
		<content:encoded><![CDATA[<p>Forgot to add HIGHEST_PROTOCOL.  Tickle itself would also enter an infinite loop on cyclic references so <code>.fast=True</code> would be a fair for comparison  (Thanks for noticing it!). I wrote tickle for a rpc module which relies on small messages with a predefined structure for which Tickle with its templates is suited way better than pickle and is easier to maintain and handle than using <code>pack</code>.</p>
<p><strong>gzip</strong> doesn&#8217;t make a whole lot of a difference on small rpc packets, I guess.  Maybe delta-compression would though (which is just a fancy word in this case for persisting the state of the compressor between messages). I&#8217;ll write two backends, one Tickle and one &#8216;GzPickle&#8217; to test it.</p>
<p>To make writing templates more convenient I&#8217;ll write a <code>pack</code> lilke format string that is converted magically by (un)pickle to an appropriate tuple.</p>
<p>When writing in Python itself, it&#8217;s suboptimal anyway.  Using classes really doesn&#8217;t make that much of a difference compared to using a table of functions in python anyway.  I should write a c version of Tickle if it ends up to be useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alexandre Vassalotti</title>
		<link>http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159229</link>
		<dc:creator>Alexandre Vassalotti</dc:creator>
		<pubDate>Fri, 28 Dec 2007 22:51:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.w-nz.com/archives/2007/12/28/tickle/#comment-159229</guid>
		<description>You can improve significantly the size of pickle stream by using the newer binary protocols. Using your example:

&lt;code&gt;&#62;&#62;&#62; obj = []
&#62;&#62;&#62; for i in xrange(100):
...   obj.append((i, str(i)))
... 
&#62;&#62;&#62; len(pickle.dumps(obj))
2178
&#62;&#62;&#62; len(pickle.dumps(obj, 2))
1098&lt;/code&gt;

Anyway, I saw that you used &lt;code&gt;pickle.HIGHEST_PROTOCOL&lt;/code&gt; in your &lt;b&gt;tickle&lt;/b&gt; module, so probably already knew this. Yet, you can improve the size even more by using the (undocumented) 'fast' attribute, which turns off memoization.

&lt;code&gt;&#62;&#62;&#62; s = StringIO()
&#62;&#62;&#62; p = pickle.Pickler(s, 2)
&#62;&#62;&#62; p.fast = True
&#62;&#62;&#62; p.dump(obj)
&#62;&#62;&#62; len(s.getvalue())
696&lt;/code&gt;

However by disabling memoization, &lt;b&gt;pickle&lt;/b&gt; will enter in an infinite loop if it encounters an object that is cyclic&#8212;e.g., &lt;code&gt;L = []; L.append(L)&lt;/code&gt;. Now if you really care about size, nothing stop you from using the &lt;b&gt;gzip&lt;/b&gt; module.

Overall, there is a few neat things about your serialization protocol. I like how you organized the code&#8212;i.e., each type has his own class, instead of one big serializer class. I am sure there is some drawbacks to this approach, but it is neat nevertheless.</description>
		<content:encoded><![CDATA[<p>You can improve significantly the size of pickle stream by using the newer binary protocols. Using your example:</p>
<p><code>&gt;&gt;&gt; obj = []<br />
&gt;&gt;&gt; for i in xrange(100):<br />
&#8230;   obj.append((i, str(i)))<br />
&#8230;<br />
&gt;&gt;&gt; len(pickle.dumps(obj))<br />
2178<br />
&gt;&gt;&gt; len(pickle.dumps(obj, 2))<br />
1098</code></p>
<p>Anyway, I saw that you used <code>pickle.HIGHEST_PROTOCOL</code> in your <b>tickle</b> module, so probably already knew this. Yet, you can improve the size even more by using the (undocumented) &#8216;fast&#8217; attribute, which turns off memoization.</p>
<p><code>&gt;&gt;&gt; s = StringIO()<br />
&gt;&gt;&gt; p = pickle.Pickler(s, 2)<br />
&gt;&gt;&gt; p.fast = True<br />
&gt;&gt;&gt; p.dump(obj)<br />
&gt;&gt;&gt; len(s.getvalue())<br />
696</code></p>
<p>However by disabling memoization, <b>pickle</b> will enter in an infinite loop if it encounters an object that is cyclic&mdash;e.g., <code>L = []; L.append(L)</code>. Now if you really care about size, nothing stop you from using the <b>gzip</b> module.</p>
<p>Overall, there is a few neat things about your serialization protocol. I like how you organized the code&mdash;i.e., each type has his own class, instead of one big serializer class. I am sure there is some drawbacks to this approach, but it is neat nevertheless.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
