|
|
Line 1: |
Line 1: |
| {{redirect|Deflate||Deflation (disambiguation)}}
| | Oscar is what my wife enjoys to contact me and I totally dig that title. Hiring is her day job now and she will not alter it anytime soon. Puerto Rico is where he and his wife reside. Doing ceramics is what adore performing.<br><br>Also visit my page - diet meal delivery ([http://N2C.co/healthyfooddelivery98296 simply click N2C.co]) |
| {{Refimprove|date=January 2009}}
| |
| | |
| In [[computing]], '''deflate''' is a [[data compression]] [[algorithm]] that uses a combination of the [[LZ77 and LZ78|LZ77]] algorithm and [[Huffman coding]]. It was originally defined by [[Phil Katz]] for version 2 of his [[PKZIP]] archiving tool and was later specified in RFC 1951.
| |
| | |
| The original algorithm as designed by Katz was patented as {{cite patent |country=US |number=5051745 |status=patent}} and assigned to [[PKWARE]].<ref>{{cite book|title=Data Compression: The Complete Reference|last=David|first=Salomon|year=2007|edition=4|page=241|publisher=Springer|isbn=978-1-84628-602-5|url=http://books.google.com/books?id=ujnQogzx_2EC&pg=PA241}}</ref> Deflate is widely thought{{Clarify|date=December 2013}} to be implementable in a manner not covered by patents.<ref>{{cite IETF |title=DEFLATE Compressed Data Format Specification version 1.3 |rfc=1951 |sectionname= |section=Abstract |page=1 |authorlink=L Peter Deutsch |year=1996 |month=May |publisher=[[Internet Engineering Task Force|IETF]] |accessdate=11 November 2012 }}</ref> This has led to its widespread use, for example in [[gzip]] compressed files, [[Portable Network Graphics|PNG]] image files and the [[ZIP (file format)|ZIP]] file format for which Katz originally designed it.
| |
| | |
| == Stream format ==
| |
| A Deflate stream consists of a series of blocks. Each block is preceded by a 3-[[bit]] header:
| |
| | |
| * First bit: Last-block-in-stream marker:
| |
| ** <code>1</code>: this is the last block in the stream.
| |
| ** <code>0</code>: there are more blocks to process after this one.
| |
| * Second and third bits: Encoding method used for this block type:
| |
| ** <code>00</code>: a stored/raw/literal section, between 0 and 65,535 bytes in length.
| |
| ** <code>01</code>: a ''static Huffman'' compressed block, using a pre-agreed Huffman tree.
| |
| ** <code>10</code>: a compressed block complete with the Huffman table supplied.
| |
| ** <code>11</code>: reserved, don't use.
| |
| | |
| Most blocks will end up being encoded using method <code>10</code>, the ''dynamic Huffman'' encoding, which produces an optimised Huffman tree customised for each block of data individually. Instructions to generate the necessary Huffman tree immediately follow the block header.
| |
| | |
| Compression is achieved through two steps
| |
| | |
| * The matching and replacement of duplicate strings with pointers.
| |
| * Replacing symbols with new, weighted symbols based on frequency of use.
| |
| | |
| === Duplicate string elimination ===
| |
| {{main|LZ77 and LZ78}}
| |
| | |
| Within compressed blocks, if a duplicate series of bytes is spotted (a repeated string), then a back-[[Reference (computer science)|reference]] is inserted, linking to the previous location of that identical string instead. An encoded match to an earlier string consists of a length (3–258 bytes) and a distance (1–32,768 bytes). Relative back-references can be made across any number of blocks, as long as the distance appears within the last 32 kB of uncompressed data decoded (termed the ''sliding window'').
| |
| | |
| === Bit reduction ===
| |
| {{main|Huffman coding}}
| |
| | |
| The second compression stage consists of replacing commonly used symbols with shorter representations and less commonly used symbols with longer representations. The method used is [[Huffman coding]] which creates an unprefixed tree of non-overlapping intervals, where the length of each sequence is inversely proportional to the probability of that symbol needing to be encoded. The more likely a symbol has to be encoded, the shorter its bit-sequence will be.
| |
| | |
| A tree is created which contains space for 288 symbols:
| |
| | |
| * 0–255: represent the literal bytes/symbols 0–255.
| |
| * 256: end of block – stop processing if last block, otherwise start processing next block.
| |
| * 257–285: combined with extra-bits, a match length of 3–258 bytes.
| |
| * 286, 287: not used, reserved and illegal but still part of the tree.
| |
| | |
| A match length code will always be followed by a distance code. Based on the distance code read, further "extra" bits may be read in order to produce the final distance. The distance tree contains space for 32 symbols:
| |
| | |
| * 0–3: distances 1–4
| |
| * 4–5: distances 5–8, 1 extra bit
| |
| * 6–7: distances 9–16, 2 extra bits
| |
| * 8–9: distances 17–32, 3 extra bits
| |
| * ...
| |
| * 26–27: distances 8,193–16,384, 12 extra bits
| |
| * 28–29: distances 16,385–32,768, 13 extra bits
| |
| * 30–31: not used, reserved and illegal but still part of the tree.
| |
| | |
| Note that for the match distance symbols 2–29, the number of extra bits can be calculated as <math>\frac{n}{2}-1</math>.
| |
| | |
| == Encoder/compressor ==
| |
| During the compression stage, it is the ''encoder'' that chooses the amount of time spent looking for matching strings. The zlib/gzip reference implementation allows the user to select from a [[sliding scale]] of likely resulting compression-level vs. speed of encoding. Options range from <code>-0</code> (do not attempt compression, just store uncompressed) to <code>-9</code> representing the maximum capability of the reference implementation in zlib/gzip.
| |
| | |
| Other Deflate encoders have been produced, all of which will also produce a compatible [[bitstream]] capable of being decompressed by any existing Deflate decoder. Differing implementations will likely produce variations on the final encoded bit-stream produced. The focus with non-zlib versions of an encoder has normally been to produce a more efficiently compressed and smaller encoded stream.
| |
| | |
| ===Deflate64/Enhanced Deflate===
| |
| Deflate64, specified by PKWare, is a proprietary variant of the Deflate procedure. The fundamental mechanisms remain the same. What has changed is the increase in dictionary size from 32kB to 64kB, an addition of 14 bits to the distance codes so that they may address a range of 64kB, and the length code has been extended by 16 bits so that it may define lengths of 3 to 65538 bytes.<ref>[http://www.binaryessence.com/dct/imp/en000225.htm Binary Essence - Deflate64]</ref> This leads to Deflate64 having a slightly higher compression ratio and a slightly lower compression time than Deflate.<ref>[http://www.binaryessence.com/dct/apc/en000263.htm Binary Essence - "Calgary Corpus" compression comparisons]</ref> Several free and/or open source projects support Deflate64, such as [[7-Zip]],<ref>[http://docs.bugaco.com/7zip/MANUAL/switches/method.htm 7-Zip Manual and Documentation - compression Method]</ref> while others, such as [[zlib]], do not, as a result of the proprietary nature of the procedure {{citation needed|date=January 2012}} and the very modest performance increase over Deflate.<ref>zlib FAQ - Does zlib support the new "Deflate64" format introduced by PKWare? [http://www.zlib.net/zlib_faq.html#faq40]</ref>
| |
| | |
| == Using Deflate in new software ==
| |
| Implementations of Deflate are freely available in many languages. C programs typically use the zlib library (under the old [[license of zlib/libpng|BSD license]] without advertising clause). Programs written using the [[Borland]] dialects of Pascal can use paszlib; a [[C++]] library is included as part of [[7-Zip]]/[[AdvanceCOMP]]. Java includes support as part of the standard library (in java.util.zip). [[Microsoft .NET Framework]] 2.0 base class library supports it in the System.IO.Compression namespace.
| |
| | |
| === Encoder implementations ===
| |
| * [[PKZIP]]: the first implementation, originally done by [[Phil Katz]] as part of [[PKZip]].
| |
| * [[zlib]]/[[gzip]]: standard reference implementation used in a huge amount of software, owing to public availability of the source code and a license allowing inclusion into other software.
| |
| ** [http://www.jcraft.com/jzlib/ jzlib]: Rewrite/re-implementation/port of the <code>zlib</code> encoder into pure [[Java (programming language)|Java]] and distributed under a [[BSD license]]. (Fully featured replacement for <code>java.util.zip</code>).
| |
| ** [http://www.nomssi.de/paszlib/paszlib.html PasZLIB]: Translation/port of the <code>zlib</code> code into [[Pascal (programming language)|Pascal]] source code by Jacques Nomssi-Nzali.
| |
| ** [http://sourceforge.net/projects/gziplite/ gziplite]: Minimalist rework of <code>gzip</code> / <code>gunzip</code> with minimal memory requirement, also supporting on-the-fly data compression/decompression (no need to bufferize all input) and input/output to/from memory.
| |
| * [http://code.google.com/p/miniz/ miniz] - Public domain Deflate/Inflate implementation with a zlib-compatible API in a single .C source file
| |
| * [http://lodev.org/lodepng/ lodepng] by Lode Vandevenne. A [[BSD license|BSD-licensed]] single file PNG file reader with built-in C++ Inflate implementation and no external dependencies.
| |
| * [http://advsys.net/ken/utils.htm#kzip KZIP]/[[PNGOUT]]: an encoder by the game-programmer [[Ken Silverman]] using <cite>"an exhaustive search of all patterns"</cite> and <cite>"[an] advanced block splitter"</cite>.
| |
| * [http://www.cs.tut.fi/~albert/Dev/puzip/ PuZip]: designed for [[Commodore 64]]/[[Commodore 128|C128]] computers. PuZip is limited to an 8kB LZ77 window size, with only the store (type <code>00</code>) and fixed Huffman (type <code>01</code>) methods.
| |
| * [http://www.bigspeed.net/index.php?page=bsdefdll BigSpeed Deflate]: <cite>"Tiny in-memory compression library"</cite> available as a MS Windows DLL limited to 32kB blocks at a time and three compression settings.
| |
| * [http://www.walbeehm.com/download/ BJWFlate & DeflOpt]/[[DeflOpt]]: Ben Jos Walbeehm's utilities <cite>"designed to attempt to squeeze every possible byte out of the files it compresses"</cite>. Note that the author has stopped development on BJWFlate (but not DeflOpt) in March 2004.
| |
| * [[Crypto++]]: contains a public domain implementation in [[C++]] aimed mainly at reducing potential [[Vulnerability (computing)|security vulnerabilities]]. The author, Wei Dai states "<cite>This code is less clever, but hopefully more understandable and maintainable [than zlib]</cite>".
| |
| * [http://msdn.microsoft.com/en-us/library/system.io.compression.deflatestream.aspx DeflateStream] - an implementation of a stream that performs DEFLATE compression, it is packaged with the Base Class Library included with the .NET Framework.
| |
| * [http://cheeso.members.winisp.net/DotNetZipHelp/html/26cbdba2-021a-ccf1-a9c9-b7ae55f6ecb8.htm ParallelDeflateOutputStream] - an open source stream that implements a parallel (multi-thread) deflating stream, for use in .NET programs.
| |
| * [[7-Zip]]/[[AdvanceCOMP]]: written by Igor Pavlov in [[C++]], this version is freely licensed and tends to achieve higher compression than zlib at the expense of CPU usage. Has an option to use the DEFLATE64 storage format.
| |
| <!-- * [[NetBSD]] [http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/usr.bin/gzip/gzip.c `gzip`]: written by Matthew R. Green, BSD licenced. non-GPL frontend onto zlib. -->
| |
| * [http://seed7.sourceforge.net/libraries/deflate.htm deflate.s7i]/[http://seed7.sourceforge.net/libraries/gzip.htm gzip.s7i], a pure-[[Seed7]] implementation of Deflate and gzip compression, by Thomas Mertes. Made available under the GNU [[GNU Lesser General Public License|LGPL]] license.
| |
| * [[PuTTY]] `sshzlib.c`: a standalone implementation, capable of full decode, but static tree only creation, by Simon Tatham. [[MIT License|MIT licensed]].
| |
| * [http://www.chiark.greenend.org.uk/~sgtatham/halibut/ Halibut] `deflate.c`: a standalone implementation capable of full decode. Forked from PuTTY's `sshzlib.c`, but extended to write dynamic Huffman trees and provides Adler-32 and CRC-32 checksum support.
| |
| * [[Plan 9 from Bell Labs]] operating system's [http://plan9.bell-labs.com/sources/plan9/sys/src/libflate/ libflate] implements deflate compression.
| |
| * [[Red Gate Software#HyperBac|Hyperbac]] : uses its own proprietary lossless compression library (written in C++ and Assembly) with an option to implement the DEFLATE64 storage format.
| |
| * [http://gildas-lormeau.github.com/zip.js/ zip.js] : JavaScript implementation.
| |
| * [[Zopfli]] : public domain C implementation by Google that achieves highest compression at the expense of CPU usage.
| |
| | |
| [[AdvanceCOMP]] uses the higher compression ratio version of Deflate as implemented by 7-Zip to enable recompression of [[gzip]], [[Portable Network Graphics|PNG]], [[Multiple-image Network Graphics|MNG]] and [[ZIP file format|ZIP]] files with the possibility of achieving smaller file sizes than zlib is able to at maximum settings. An even more effective (but also more user-input-demanding and CPU intensive) Deflate encoder is employed inside [[Ken Silverman]]'s KZIP and [[PNGOUT]] utilities.
| |
| | |
| === Hardware encoders ===
| |
| * AHA361-PCIX/AHA362-PCIX from [http://www.aha.com/ Comtech AHA]. Comtech produced a [[PCI-X]] card (PCI-ID: <code>193f:0001</code>) capable of compressing streams using Deflate at a rate of up to 3.0 Gbit/s (375 MB/s) for incoming uncompressed data. Accompanying the [[Linux (kernel)|Linux kernel]] [[Device driver|driver]] for the AHA361-PCIX is an '<code>ahagzip</code>' utility and customised '<code>mod_deflate_aha</code>' capable of using the hardware compression from [[Apache HTTP Server|Apache]]. The hardware is based on a [[Xilinx]] [[Virtex (FPGA)|Virtex]] [[FPGA]] and four custom AHA3601 [[Application-specific integrated circuit|ASICs]]. The AHA361/AHA362 boards are limited to only handling static Huffman blocks and require software to be modified to add support—the cards were not able to support the full Deflate specification meaning they could only reliably decode their own output (a stream that did not contain any dynamic Huffman type 2 blocks).
| |
| * [http://www.indranetworks.com/SC300.html StorCompress 300]/[http://www.indranetworks.com/SCMX3.html MX3] from [http://www.indranetworks.com/ Indra Networks]. This is a range of [[PCI Local Bus|PCI]] (PCI-ID: <code>17b4:0011</code>) or PCI-X cards featuring between one and six compression engines with claimed processing speeds of up to 3.6 Gbit/s (450 MB/s). A version of the cards are available with the separate brand ''WebEnhance'' specifically designed for web-serving use rather than [[Storage area network|SAN]] or backup use; a [[PCIe]] revision, the [http://www.indranetworks.com/SCMX4E.html MX4E] is also produced.
| |
| * [http://www.aha.com/show_prod.php?id=36 AHA363-PCIe]/[http://www.aha.com/show_prod.php?id=37 AHA364-PCIe]/[http://www.aha.com/show_prod.php?id=38 AHA367-PCIe]. In 2008, Comtech started producing two PCIe cards (<code>PCI-ID: 193f:0363</code>/<code>193f:0364</code>) with a new hardware AHA3610 encoder chip. The new chip was designed to be capable of a sustained 2.5Gbit/s. Using two of these chips, the AHA363-PCIe board can process Deflate at a rate of up to 5.0 Gbit/s (625 MB/s) using the two channels (two compression and two decompression). The AHA364-PCIe variant is an encode-only version of the card designed for out-going [[load balancer]]s and instead has multiple register sets to allow 32 independent ''virtual'' compression channels feeding two physical compression engines. Linux, [[Microsoft Windows]], and [[OpenSolaris]] kernel device drivers are available for both of the new cards, along with a modified zlib system library so that dynamically linked applications can automatically use the hardware support without internal modification. The AHA367-PCIe board (<code>PCI-ID: 193f:0367</code>) is similar to the AHA363-PCIe but uses four AHA3610 chips for a sustained compression rate of 10 Gbit/s (1250 MB/s). Unlike the AHA362-PCIX, the decompression engines on the AHA363-PCIe and AHA367-PCIe boards are fully deflate compliant.
| |
| | |
| == Decoder/decompressor ==
| |
| Inflate is the decoding process that takes a Deflate bit stream for decompression and correctly produces the original full-size data or file.
| |
| | |
| === Inflate-only implementations ===
| |
| The normal intent with an alternative Inflate implementation is highly optimised decoding speed, or extremely predictable RAM usage for micro-controller embedded systems.
| |
| | |
| * [[Assembly language|Assembly]]
| |
| ** [http://xasm.atari.org/inflate.html 6502 inflate], written by Piotr Fusik in [[MOS Technology 6502|6502]] assembly language.
| |
| ** [http://www.pisi.com.pl/piotr433/mk90mc1e.htm#inflate Elektronika MK-90 inflate], the above 6502 program ported by Piotr Piatek to the [[PDP-11 architecture]].
| |
| ** [http://sourceforge.net/projects/samflate/ SAMflate], written by Andrew Collier in [[Z80]] assembly language with optional memory paging support for the [[SAM Coupé]], and made available under the [[BSD license|BSD]]/[[GNU General Public License|GPL]]/[[GNU Lesser General Public License|LGPL]]/[[Debian Free Software Guidelines|DFSG]] licenses.
| |
| <!-- ** <code>PCUNZP.ASM</code>, by Michael Mefford. written in [[x86]] [[assembly language]] and published in PC Magazine 1992-03-31. need to check, might not have supported the newer ''Deflate'' method -->
| |
| | |
| * [[C (programming language)|C]]/[[C++]]
| |
| ** [http://www.mikekohn.net/file_formats/kunzip.php kunzip] by Michael Kohn and unrelated to "KZIP". Comes with [[C (programming language)|C]] source-code under the GNU [[LGPL]] license. Used in the [[GIMP]] installer.
| |
| ** puff.c ([[zlib]]), a small, unencumbered, single-file reference implementation included in the /contrib/puff directory of the zlib distribution.
| |
| ** [http://www.ibsensoftware.com/download.html tinf] written by Jørgen Ibsen in ANSI C and comes with zlib license. Adds about 2k code.
| |
| ** [http://code.google.com/p/miniz/source/browse/trunk/tinfl.c tinfl.c] ([http://code.google.com/p/miniz/ miniz]), Public domain Inflate implementation contained entirely in a single C function.
| |
| | |
| * <code>PCDEZIP</code>, Bob Flanders and Michael Holmes, published in PC Magazine 1994–01–11.
| |
| * [http://opensource.franz.com/deflate/ inflate.cl] by John Foderaro. Self-standing [[Common Lisp]] decoder distributed with a GNU [[LGPL]] license.
| |
| * [http://seed7.sourceforge.net/libraries/inflate.htm inflate.s7i]/[http://seed7.sourceforge.net/libraries/gzip.htm gzip.s7i], a pure-[[Seed7]] implementation of Deflate and gzip decompression, by Thomas Mertes. Made available under the GNU [[GNU Lesser General Public License|LGPL]] license.
| |
| * [http://www.paul.sladen.org/projects/pyflate/ pyflate], a pure-[[Python (programming language)|Python]] stand-alone Deflate ([[gzip]]) and [[bzip2]] decoder by Paul Sladen. Written for research/prototyping and made available under the [[BSD license|BSD]]/[[GNU General Public License|GPL]]/[[GNU Lesser General Public License|LGPL]]/[[Debian Free Software Guidelines|DFSG]] licenses.
| |
| * [http://lua-users.org/wiki/ModuleCompressDeflateLua deflatelua], a pure-[[Lua (programming language)|Lua]] implementation of Deflate and [[gzip]]/zlib decompression, by David Manura.
| |
| * [https://github.com/chrisdickinson/inflate inflate] a pure-[[Javascript (programming language)|Javascript]] implementation of Inflate by Chris Dickinson
| |
| | |
| ===Hardware decoders===
| |
| * [http://www.bitsim.com/en/badge.htm Serial Inflate GPU] from BitSim. Hardware implementation of Inflate. Part of BitSim's ''BADGE'' (Bitsim Accelerated Display Graphics Engine) controller offering for embedded systems.
| |
| | |
| == See also ==
| |
| * [[List of archive formats]]
| |
| * [[List of file archivers]]
| |
| * [[Comparison of file archivers]]
| |
| | |
| ==References==
| |
| {{reflist|30em}}
| |
| | |
| == External links ==
| |
| * [[PKWARE]]'s <code>appnote.txt</code>, [http://www.pkware.com/documents/casestudies/APPNOTE.TXT ''.ZIP File Format Specification'']; Section 10, ''X. Deflating - Method 8''.
| |
| * RFC 1951 – ''Deflate Compressed Data Format Specification version 1.3''
| |
| * [http://www.zlib.net zlib Home Page]
| |
| * [http://zlib.net/feldspar.html ''An Explanation of the Deflate Algorithm''] by Antaeus Feldspar.
| |
| * [http://www.larsson.dogma.net/dccpaper.pdf ''Extended Application of Suffix Trees to Data Compression'' ] An excellent algorithm to implement Deflate by Jesper Larsson
| |
| | |
| {{Compression Methods}}
| |
| | |
| [[Category:Lossless compression algorithms]]
| |