MINI MINI MANI MO
ó
i:Oc           @   s5  d  Z  d d l Z d d l Z d d l Z y d d l Z Wn e k
 rS d Z n Xd d l Z d d l	 m
 Z
 d d l m Z e
 j
   d Z e e d d  d d	 g e d
 d   Z e e j e e   Z e j d  Z e d
  Z d d d  Z d d  Z d   Z d d  Z d d  Z d Z d S(   sÍ   
---------------------------------------------
Miscellaneous functions for manipulating text
---------------------------------------------
Collection of text functions that don't fit in another category.
i˙˙˙˙N(   t   sets(   t   ControlCharErrorg333333ă?i    i   i   i   i   i    s   (?s)<[^>]*>|&#?\w+;c         C   są   t  |  t  s' t t j d    n  d } y t |  | d  Wn t k
 rZ d } n X| r t r | r t j	 |   } | d t
 k r | d } q n  | s d } n  | S(   s#  Try to guess the encoding of a byte :class:`str`
    :arg byte_string: byte :class:`str` to guess the encoding of
    :kwarg disable_chardet: If this is True, we never attempt to use
        :mod:`chardet` to guess the encoding.  This is useful if you need to
        have reproducibility whether :mod:`chardet` is installed or not.
        Default: :data:`False`.
    :raises TypeError: if :attr:`byte_string` is not a byte :class:`str` type
    :returns: string containing a guess at the encoding of
        :attr:`byte_string`.  This is appropriate to pass as the encoding
        argument when encoding and decoding unicode strings.
    We start by attempting to decode the byte :class:`str` as :term:`UTF-8`.
    If this succeeds we tell the world it's :term:`UTF-8` text.  If it doesn't
    and :mod:`chardet` is installed on the system and :attr:`disable_chardet`
    is False this function will use it to try detecting the encoding of
    :attr:`byte_string`.  If it is not installed or :mod:`chardet` cannot
    determine the encoding with a high enough confidence then we rather
    arbitrarily claim that it is ``latin-1``.  Since ``latin-1`` will encode
    to every byte, decoding from ``latin-1`` to :class:`unicode` will not
    cause :exc:`UnicodeErrors` although the output might be mangled.
    s'   byte_string must be a byte string (str)s   utf-8t   strictt
   confidencet   encodings   latin-1N(   t
   isinstancet   strt	   TypeErrort   kt   b_t   unicodet   UnicodeDecodeErrort   Nonet   chardett   detectt   _CHARDET_THRESHHOLD(   t   byte_stringt   disable_chardett   input_encodingt   detection_info(    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   guess_encoding;   s    
	s   utf-8t   replacec         C   sz   y |  | k  o |  | k SWn t  k
 r/ n Xt |  t  rT |  j | |  }  n | j | |  } |  | k rv t St S(   sÝ  Compare two stringsi, converting to byte :class:`str` if one is
    :class:`unicode`
    :arg str1: First string to compare
    :arg str2: Second string to compare
    :kwarg encoding: If we need to convert one string into a byte :class:`str`
        to compare, the encoding to use.  Default is :term:`utf-8`.
    :kwarg errors: What to do if we encounter errors when encoding the string.
        See the :func:`kitchen.text.converters.to_bytes` documentation for
        possible values.  The default is ``replace``.
    This function prevents :exc:`UnicodeError` (python-2.4 or less) and
    :exc:`UnicodeWarning` (python 2.5 and higher) when we compare
    a :class:`unicode` string to a byte :class:`str`.  The errors normally
    arise because the conversion is done to :term:`ASCII`.  This function
    lets you convert to :term:`utf-8` or another encoding instead.
    .. note::
        When we need to convert one of the strings from :class:`unicode` in
        order to compare them we convert the :class:`unicode` string into
        a byte :class:`str`.  That means that strings can compare differently
        if you use different encodings for each.
    Note that ``str1 == str2`` is faster than this function if you can accept
    the following limitations:
    * Limited to python-2.5+ (otherwise a :exc:`UnicodeDecodeError` may be
      thrown)
    * Will generate a :exc:`UnicodeWarning` if non-:term:`ASCII` byte
      :class:`str` is compared to :class:`unicode` string.
    (   t   UnicodeErrorR   R
   t   encodet   Truet   False(   t   str1t   str2R   t   errors(    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   str_eqd   s    !
c         C   s  t  |  t  s' t t j d    n  | d k rX t t t d g t	 t    } n¤ | d k r t t t d g t	 t    } ns | d k rç d } t
 |   } g  t D] } | | k rŽ | ^ qŽ rü t t j d    qü n t
 t j d    | r|  j |  }  n  |  S(	   s˙  Look for and transform :term:`control characters` in a string
    :arg string: string to search for and transform :term:`control characters`
        within
    :kwarg strategy: XML does not allow :term:`ASCII` :term:`control
        characters`.  When we encounter those we need to know what to do.
        Valid options are:
        :replace: (default) Replace the :term:`control characters`
            with ``"?"``
        :ignore: Remove the characters altogether from the output
        :strict: Raise a :exc:`~kitchen.text.exceptions.ControlCharError` when
            we encounter a control character
    :raises TypeError: if :attr:`string` is not a unicode string.
    :raises ValueError: if the strategy is not one of replace, ignore, or
        strict.
    :raises kitchen.text.exceptions.ControlCharError: if the strategy is
        ``strict`` and a :term:`control character` is present in the
        :attr:`string`
    :returns: :class:`unicode` string with no :term:`control characters` in
        it.
    sD   process_control_char must have a unicode type as the first argument.t   ignoreR   u   ?R   s*   ASCII control code present in string inputsX   The strategy argument to process_control_chars must be one of ignore, replace, or strictN(   R   R
   R   R   R	   t   dictt   zipt   _CONTROL_CODESR   t   lent	   frozensett   _CONTROL_CHARSR   t
   ValueErrort	   translate(   t   stringt   strategyt
   control_tablet   datat   c(    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   process_control_chars   s    %%%c         C   sC   d   } t  |  t  s0 t t j d    n  t j t | |   S(   s/  Substitute unicode characters for HTML entities
    :arg string: :class:`unicode` string to substitute out html entities
    :raises TypeError: if something other than a :class:`unicode` string is
        given
    :rtype: :class:`unicode` string
    :returns: The plain text without html entities
    c         S   s   |  j  d  } | d  d k r# d S| d  d k r yE | d  d k r` t t | d d	 !d
   St t | d d	 !  SWqt k
 r qXn | d  d k rt j j | d d	 !j d   } | r| d  d
 k r	y t t | d d	 !  SWqt k
 rqXqt | d  Sqn  | S(   Ni    i   u   <t    i   u   &#i   u   &#xi˙˙˙˙i   u   &s   utf-8s   &#s
   iso-8859-1(	   t   groupt   unichrt   intR%   t   htmlentitydefst
   entitydefst   getR   R
   (   t   matchR'   t   entity(    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   fixupÔ   s(    
"
sF   html_entities_unescape must have a unicode type for its first argument(   R   R
   R   R   R	   t   ret   subt
   _ENTITY_RE(   R'   R6   (    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   html_entities_unescapeË   s    		c         C   s^   t  |  t  s t Sy t |  |  } Wn t k
 r: t SXt |  } | j t  rZ t St S(   sĂ  Check that a byte :class:`str` would be valid in xml
    :arg byte_string: Byte :class:`str` to check
    :arg encoding: Encoding of the xml file.  Default: :term:`UTF-8`
    :returns: :data:`True` if the string is valid.  :data:`False` if it would
        be invalid in the xml file
    In some cases you'll have a whole bunch of byte strings and rather than
    transforming them to :class:`unicode` and back to byte :class:`str` for
    output to xml, you will just want to make sure they work with the xml file
    you're constructing.  This function will help you do that.  Example::
        ARRAY_OF_MOSTLY_UTF8_STRINGS = [...]
        processed_array = []
        for string in ARRAY_OF_MOSTLY_UTF8_STRINGS:
            if byte_string_valid_xml(string, 'utf-8'):
                processed_array.append(string)
            else:
                processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
        output_xml(processed_array)
    (	   R   R   R   R
   R   R#   t   intersectionR$   R   (   R   R   t   u_stringR*   (    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   byte_string_valid_xmlő   s    
c         C   s*   y t  |  |  Wn t k
 r% t SXt S(   sÔ  Detect if a byte :class:`str` is valid in a specific encoding
    :arg byte_string: Byte :class:`str` to test for bytes not valid in this
        encoding
    :kwarg encoding: encoding to test against.  Defaults to :term:`UTF-8`.
    :returns: :data:`True` if there are no invalid :term:`UTF-8` characters.
        :data:`False` if an invalid character is detected.
    .. note::
        This function checks whether the byte :class:`str` is valid in the
        specified encoding.  It **does not** detect whether the byte
        :class:`str` actually was encoded in that encoding.  If you want that
        sort of functionality, you probably want to use
        :func:`~kitchen.text.misc.guess_encoding` instead.
    (   R
   R   R   R   (   R   R   (    (    s5   /usr/lib/python2.7/site-packages/kitchen/text/misc.pyt   byte_string_valid_encoding  s
    
R>