通过apache,和nginx模块去除html中的空格和tab

最近一个项目中,合作方要求去除html中的空格,不想改代码,所以百度了一下通过apache,和nginx模块去除html中的空格和tab的方案,下面记录下来:

一、nginx

nginx可以通过mod_strip模块来实现该功能

1. mod_strip安装:

# cd /usr/local/src/

# wget http://wiki.nginx.org/images/6/63/Mod_strip-0.1.tar.gz

# tar -xzvf Mod_strip-0.1.tar.gz

# cd nginx-1.4.2 //提前解压好的nginx

# ./configure --prefix=/usr/local/nginx-1.4.2 --add-module=../mod_strip

# make

# make install

2. mod_strip简单用法:

location / {

strip on;

}

strip指令:

语法: strip on|off

默认: off

可用配置段: main, http, server, location

所有响应给用户的MIME类型为text/html将会使用该模块

二、apache

apche可以通过mod_pagespeed http://www.modpagespeed.com/ 来实现,我的服务器是ubuntu

1、下载mode_pagespeed

wget https://dl-ssl.google.com/dl/linux/direct/mod-pagespeed-beta_current_i386.deb

2、安装

dpkg -i mod-pagespeed*

执行完毕后会提示重启

3、修改配置文件

/etc/apache2/mods-available/pagespeed.conf,下面我着重介绍一下ModPagespeedEnableFilters这个是启用的过滤器,collapse_whitespace,remove_comments分别为替换空白字符和删除注释,其它参数请见mod_pagespeed官网

<IfModule pagespeed_module>
    # Turn on mod_pagespeed. To completely disable mod_pagespeed, you
    # can set this to "off".
    ModPagespeed on

    # We want VHosts to inherit global configuration.
    # If this is not included, they'll be independent (except for inherently
    # global options), at least for backwards compatibility.
    ModPagespeedInheritVHostConfig on

    # Direct Apache to send all HTML output to the mod_pagespeed
    # output handler.
    AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
    AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER application/javascript
    AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/css

    # If you want mod_pagespeed process XHTML as well, please uncomment this
    # line.
    #AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER application/xhtml+xml

    # The ModPagespeedFileCachePath directory must exist and be writable
    # by the apache user (as specified by the User directive).
    ModPagespeedFileCachePath            "/var/cache/mod_pagespeed/"

    # LogDir is needed to store various logs, including the statistics log
    # required for the console.
    ModPagespeedLogDir "/var/log/pagespeed"

    # The locations of SSL Certificates is distribution-dependent.
    ModPagespeedSslCertDirectory "/etc/ssl/certs"
    

    # If you want, you can use one or more memcached servers as the store for
    # the mod_pagespeed cache.
    # ModPagespeedMemcachedServers localhost:11211

    # A portion of the cache can be kept in memory only, to reduce load on disk
    # (or memcached) from many small files.
    # ModPagespeedCreateSharedMemoryMetadataCache "/var/cache/mod_pagespeed/" 51200

    # Override the mod_pagespeed 'rewrite level'. The default level
    # "CoreFilters" uses a set of rewrite filters that are generally
    # safe for most web pages. Most sites should not need to change
    # this value and can instead fine-tune the configuration using the
    # ModPagespeedDisableFilters and ModPagespeedEnableFilters
    # directives, below. Valid values for ModPagespeedRewriteLevel are
    # PassThrough, CoreFilters and TestingCoreFilters.
    #
    # ModPagespeedRewriteLevel PassThrough

    # Explicitly disables specific filters. This is useful in
    # conjuction with ModPagespeedRewriteLevel. For instance, if one
    # of the filters in the CoreFilters needs to be disabled for a
    # site, that filter can be added to
    # ModPagespeedDisableFilters. This directive contains a
    # comma-separated list of filter names, and can be repeated.
    #
    #ModPagespeedDisableFilters remove_comments,collapse_whitespace

    # Explicitly enables specific filters. This is useful in
    # conjuction with ModPagespeedRewriteLevel. For instance, filters
    # not included in the CoreFilters may be enabled using this
    # directive. This directive contains a comma-separated list of
    # filter names, and can be repeated.
    #
    #ModPagespeedEnableFilters rewrite_javascript,rewrite_css
    ModPagespeedEnableFilters collapse_whitespace,elide_attributes,remove_comments

    # Explicitly forbids the enabling of specific filters using either query
    # parameters or request headers. This is useful, for example, when we do
    # not want the filter to run for performance or security reasons. This
    # directive contains a comma-separated list of filter names, and can be
    # repeated.
    #
    # ModPagespeedForbidFilters rewrite_images

    # How long mod_pagespeed will wait to return an optimized resource
    # (per flush window) on first request before giving up and returning the
    # original (unoptimized) resource. After this deadline is exceeded the
    # original resource is returned and the optimization is pushed to the
    # background to be completed for future requests. Increasing this value will
    # increase page latency, but might reduce load time (for instance on a
    # bandwidth-constrained link where it's worth waiting for image
    # compression to complete). If the value is less than or equal to zero
    # mod_pagespeed will wait indefinitely for the rewrite to complete before
    # returning.
    #
    # ModPagespeedRewriteDeadlinePerFlushMs 10

    # ModPagespeedDomain
    # authorizes rewriting of JS, CSS, and Image files found in this
    # domain. By default only resources with the same origin as the
    # HTML file are rewritten. For example:
    #
    #   ModPagespeedDomain cdn.myhost.com
    #
    # This will allow resources found on http://cdn.myhost.com to be
    # rewritten in addition to those in the same domain as the HTML.
    #
    # Other domain-related directives (like ModPagespeedMapRewriteDomain
    # and ModPagespeedMapOriginDomain) can also authorize domains.
    #
    # Wildcards (* and ?) are allowed in the domain specification. Be
    # careful when using them as if you rewrite domains that do not
    # send you traffic, then the site receiving the traffic will not
    # know how to serve the rewritten content.

    # Other defaults (cache sizes and thresholds):
    #
    # ModPagespeedFileCacheSizeKb          102400
    # ModPagespeedFileCacheCleanIntervalMs 3600000
    # ModPagespeedLRUCacheKbPerProcess     1024
    # ModPagespeedLRUCacheByteLimit        16384
    # ModPagespeedCssFlattenMaxBytes       2048
    # ModPagespeedCssInlineMaxBytes        2048
    # ModPagespeedCssImageInlineMaxBytes   0
    # ModPagespeedImageInlineMaxBytes      3072
    # ModPagespeedJsInlineMaxBytes         2048
    # ModPagespeedCssOutlineMinBytes       3000
    # ModPagespeedJsOutlineMinBytes        3000
    # ModPagespeedMaxCombinedCssBytes      -1
    # ModPagespeedMaxCombinedJsBytes       92160

    # Limit the number of inodes in the file cache. Set to 0 for no limit.
    # The default value if this paramater is not specified is 0 (no limit).
    ModPagespeedFileCacheInodeLimit        500000

    # Bound the number of images that can be rewritten at any one time; this
    # avoids overloading the CPU.  Set this to 0 to remove the bound.
    #
    # ModPagespeedImageMaxRewritesAtOnce      8

    # You can also customize the number of threads per Apache process
    # mod_pagespeed will use to do resource optimization. Plain
    # "rewrite threads" are used to do short, latency-sensitive work,
    # while "expensive rewrite threads" are used for actual optimization
    # work that's more computationally expensive. If you live these unset,
    # or use values <= 0 the defaults will be used, which is 1 for both
    # values when using non-threaded MPMs (e.g. prefork) and 4 for both
    # on threaded MPMs (e.g. worker and event). These settings can only
    # be changed globally, and not per virtual host.
    #
    # ModPagespeedNumRewriteThreads 4
    # ModPagespeedNumExpensiveRewriteThreads 4

    # Randomly drop rewrites (*) to increase the chance of optimizing
    # frequently fetched resources and decrease the chance of optimizing
    # infrequently fetched resources. This can reduce CPU load. The default
    # value of this parameter is 0 (no drops).  90 means that a resourced
    # fetched once has a 10% probability of being optimized while a resource
    # that is fetched 50 times has a 99.65% probability of being optimized.
    #
    # (*) Currently only CSS files and images are randomly dropped.  Images
    # within CSS files are not randomly dropped.
    #
    # ModPagespeedRewriteRandomDropPercentage 90

    # Many filters modify the URLs of resources in HTML files. This is typically
    # harmless but pages whose Javascript expects to read or modify the original
    # URLs may break. The following parameters prevent filters from modifying
    # URLs of their respective types.
    #
    # ModPagespeedJsPreserveURLs on
    # ModPagespeedImagePreserveURLs on
    # ModPagespeedCssPreserveURLs on

    # Settings for image optimization:
    #
    # Lossy image recompression quality (0 to 100, -1 just strips metadata):
    # ModPagespeedImageRecompressionQuality 85
    #
    # Jpeg recompression quality (0 to 100, -1 uses ImageRecompressionQuality):
    # ModPagespeedJpegRecompressionQuality -1
    # ModPagespeedJpegRecompressionQualityForSmallScreens 70
    #
    # WebP recompression quality (0 to 100, -1 uses ImageRecompressionQuality):
    # ModPagespeedImageWebpRecompressionQuality 80
    # ModPagespeedImageWebpRecompressionQualityForSmallScreens 70
    #
    # Timeout for conversions to WebP format, in
    # milliseconds. Negative values mean no timeout is applied. The
    # default value is -1:
    # ModPagespeedImageWebpTimeoutMs 5000
    #
    # Percent of original image size below which optimized images are retained:
    # ModPagespeedImageLimitOptimizedPercent 100
    #
    # Percent of original image area below which image resizing will be
    # attempted:
    # ModPagespeedImageLimitResizeAreaPercent 100

    # Settings for inline preview images
    #
    # Setting this to n restricts preview images to the first n images found on
    # the page.  The default of -1 means preview images can appear anywhere on
    # the page (if those images appear above the fold).
    # ModPagespeedMaxInlinedPreviewImagesIndex -1

    # Sets the minimum size in bytes of any image for which a low quality image
    # is generated.
    # ModPagespeedMinImageSizeLowResolutionBytes 3072

    # The maximum URL size is generally limited to about 2k characters
    # due to IE: See http://support.microsoft.com/kb/208427/EN-US.
    # Apache servers by default impose a further limitation of about
    # 250 characters per URL segment (text between slashes).
    # mod_pagespeed circumvents this limitation, but if you employ
    # proxy servers in your path you may need to re-impose it by
    # overriding the setting here.  The default setting is 1024
    # characters.
    #
    # ModPagespeedMaxSegmentLength 250

    # Uncomment this if you want to prevent mod_pagespeed from combining files
    # (e.g. CSS files) across paths
    #
    # ModPagespeedCombineAcrossPaths off

    # Renaming JavaScript URLs can sometimes break them.  With this
    # option enabled, mod_pagespeed uses a simple heuristic to decide
    # not to rename JavaScript that it thinks is introspective.
    #
    # You can uncomment this to let mod_pagespeed rename all JS files.
    #
    # ModPagespeedAvoidRenamingIntrospectiveJavascript off

    # Certain common JavaScript libraries are available from Google, which acts
    # as a CDN and allows you to benefit from browser caching if a new visitor
    # to your site previously visited another site that makes use of the same
    # libraries as you do.  Enable the following filter to turn on this feature.
    #
    # ModPagespeedEnableFilters canonicalize_javascript_libraries

    # The following line configures a library that is recognized by
    # canonicalize_javascript_libraries.  This will have no effect unless you
    # enable this filter (generally by uncommenting the last line in the
    # previous stanza).  The format is:
    #    ModPagespeedLibrary bytes md5 canonical_url
    # Where bytes and md5 are with respect to the *minified* JS; use
    # js_minify --print_size_and_hash to obtain this data.
    # Note that we can register multiple hashes for the same canonical url;
    # we do this if there are versions available that have already been minified
    # with more sophisticated tools.
    #
    # Additional library configuration can be found in
    # pagespeed_libraries.conf included in the distribution.  You should add
    # new entries here, though, so that file can be automatically upgraded.
    # ModPagespeedLibrary 43 1o978_K0_LNE5_ystNklf http://www.modpagespeed.com/rewrite_javascript.js

    # Explicitly tell mod_pagespeed to load some resources from disk.
    # This will speed up load time and update frequency.
    #
    # This should only be used for static resources which do not need
    # specific headers set or other processing by Apache.
    #
    # Both URL and filesystem path should specify directories and
    # filesystem path must be absolute (for now).
    #
    # ModPagespeedLoadFromFile "http://example.com/static/" "/var/www/static/"


    # Enables server-side instrumentation and statistics.  If this rewriter is
    # enabled, then each rewritten HTML page will have instrumentation javacript
    # added that sends latency beacons to /mod_pagespeed_beacon.  These
    # statistics can be accessed at /mod_pagespeed_statistics.  You must also
    # enable the mod_pagespeed_statistics and mod_pagespeed_beacon handlers
    # below.
    #
    # ModPagespeedEnableFilters add_instrumentation

    # The add_instrumentation filter sends a beacon after the page onload
    # handler is called. The user might navigate to a new URL before this. If
    # you enable the following directive, the beacon is sent as part of an
    # onbeforeunload handler, for pages where navigation happens before the
    # onload event.
    #
    # ModPagespeedReportUnloadTime on

    # Uncomment the following line so that ModPagespeed will not cache or
    # rewrite resources with Vary: in the header, e.g. Vary: User-Agent.
    # Note that ModPagespeed always respects Vary: headers on html content.
    # ModPagespeedRespectVary on

    # Uncomment the following line if you want to disable statistics entirely.
    #
    # ModPagespeedStatistics off

    # This page lets you view statistics about the mod_pagespeed module.
    <Location /mod_pagespeed_statistics>
        Order allow,deny
        # You may insert other "Allow from" lines to add hosts you want to
        # allow to look at generated statistics.  Another possibility is
        # to comment out the "Order" and "Allow" options from the config
        # file, to allow any client that can reach your server to examine
        # statistics.  This might be appropriate in an experimental setup or
        # if the Apache server is protected by a reverse proxy that will
        # filter URLs in some fashion.
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler mod_pagespeed_statistics
    </Location>

    # Enable logging of mod_pagespeed statistics, needed for the console.
    ModPagespeedStatisticsLogging on

    <Location /pagespeed_console>
        Order allow,deny
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler pagespeed_console
    </Location>

    # Page /mod_pagespeed_message lets you view the latest messages from
    # mod_pagespeed, regardless of log-level in your httpd.conf
    # ModPagespeedMessageBufferSize is the maximum number of bytes you would
    # like to dump to your /mod_pagespeed_message page at one time,
    # its default value is 100k bytes.
    # Set it to 0 if you want to disable this feature.
    ModPagespeedMessageBufferSize 100000

    <Location /mod_pagespeed_message>
        Order allow,deny
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler mod_pagespeed_message
    </Location>
</IfModule>