November 3, 2011
November 2, 2011
enable utf8 for offlineimap
Get offlineimap working with non ASCII characters.
Introduction
I recently started using offlineimap [1] to manage my GMail accounts locally on my machine.
With offlineimap all my email is local so the navigation is very fast and also I am a little more relieved that I have a local copy of all my email.
The only problem is that offlineimap does not work well with non english characters. This makes it difficult to use for those with labels in Russian, Chinese, Japanese etc [2].
Fortunately offlineimap offers some nice options that allows us to modify it's behavior. Using the "pythonfile" and "nametrans" options I was able to get Japanese working flawlessly.
Problem Description
The problem is that IMAP4 uses a modified UTF-7 coding for all folder names. A good IMAP client like mutt understands this coding and decodes it before presenting the folder name on screen. Unfortunately offlineimap does not convert the folder names before creating local repositories. This results in very cryptic names like "&MMYwuTDI-" that makes it difficult to know what folder is what.
Some research (googling) on the topic resulted in a very nice code to add IMAP4 UTF-7 encoding capabilities to Python [3]. So with a little time and some copy/paste skills I added this code to offlineimap and ban! I got Japanese working correctly in my folders.
Fixing unicode issues in offlineimap
First we must add the code to convert from IMAP4 UTF7 encoding support to offlineimap. Thanks to Dominic LoBue (offlineimap developer) I learned that we can add our own python code to offlineimap. To do this we created a simple python script (e.g. ~/.utf7.py) that contains the following code (copy/paste from [3]:
# -*- coding: latin-1 -*-
"""
Imap folder names are encoded using a special version of utf-7 as defined in RFC
2060 section 5.1.3.
5.1.3. Mailbox International Naming Convention
By convention, international mailbox names are specified using a
modified version of the UTF-7 encoding described in [UTF-7]. The
purpose of these modifications is to correct the following problems
with UTF-7:
1) UTF-7 uses the "+" character for shifting; this conflicts with
the common use of "+" in mailbox names, in particular USENET
newsgroup names.
2) UTF-7's encoding is BASE64 which uses the "/" character; this
conflicts with the use of "/" as a popular hierarchy delimiter.
3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
the use of "\" as a popular hierarchy delimiter.
4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
the use of "~" in some servers as a home directory indicator.
5) UTF-7 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be
represented in encoded form.
In modified UTF-7, printable US-ASCII characters except for "&"
represent themselves; that is, characters with octet values 0x20-0x25
and 0x27-0x7e. The character "&" (0x26) is represented by the two-
octet sequence "&-".
All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
Unicode 16-bit octets) are represented in modified BASE64, with a
further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII
character which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to US-
ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "-
").
For example, here is a mailbox name which mixes English, Japanese,
and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-
"""
import binascii
import codecs
# encoding
def modified_base64(s):
s = s.encode('utf-16be')
return binascii.b2a_base64(s).rstrip('\n=').replace('/', ',')
def doB64(_in, r):
if _in:
r.append('&%s-' % modified_base64(''.join(_in)))
del _in[:]
def encoder(s):
r = []
_in = []
for c in s:
ordC = ord(c)
if 0x20 <= ordC <= 0x25 or 0x27 <= ordC <= 0x7e:
doB64(_in, r)
r.append(c)
elif c == '&':
doB64(_in, r)
r.append('&-')
else:
_in.append(c)
doB64(_in, r)
return (str(''.join(r)), len(s))
# decoding
def modified_unbase64(s):
b = binascii.a2b_base64(s.replace(',', '/') + '===')
return unicode(b, 'utf-16be')
def decoder(s):
r = []
decode = []
for c in s:
if c == '&' and not decode:
decode.append('&')
elif c == '-' and decode:
if len(decode) == 1:
r.append('&')
else:
r.append(modified_unbase64(''.join(decode[1:])))
decode = []
elif decode:
decode.append(c)
else:
r.append(c)
if decode:
r.append(modified_unbase64(''.join(decode[1:])))
bin_str = ''.join(r)
return (bin_str, len(s))
class StreamReader(codecs.StreamReader):
def decode(self, s, errors='strict'):
return decoder(s)
class StreamWriter(codecs.StreamWriter):
def decode(self, s, errors='strict'):
return encoder(s)
def imap4_utf_7(name):
if name == 'imap4-utf-7':
return (encoder, decoder, StreamReader, StreamWriter)
codecs.register(imap4_utf_7)
Then in our offlineimap configuration file, in the "[general]" section we add a line to load this python script like:
pythonfile = ~/.utf7.py
Then in our remote repository configuration add a nametrans option to convert all foldernames from "imap4-utf-7" encoding to your encoding of preference. My Ubuntu installation is all UTF-8 so I convert all folder names to this encoding:
nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')
The "imap4-utf-7" encoding is added by our "utf7.py" script. With this configuration I can now see the correct Japanese names for all the folders (labels) in my GMail accounts inside Mutt [4].
Introduction
I recently started using offlineimap [1] to manage my GMail accounts locally on my machine.
With offlineimap all my email is local so the navigation is very fast and also I am a little more relieved that I have a local copy of all my email.
The only problem is that offlineimap does not work well with non english characters. This makes it difficult to use for those with labels in Russian, Chinese, Japanese etc [2].
Fortunately offlineimap offers some nice options that allows us to modify it's behavior. Using the "pythonfile" and "nametrans" options I was able to get Japanese working flawlessly.
Problem Description
The problem is that IMAP4 uses a modified UTF-7 coding for all folder names. A good IMAP client like mutt understands this coding and decodes it before presenting the folder name on screen. Unfortunately offlineimap does not convert the folder names before creating local repositories. This results in very cryptic names like "&MMYwuTDI-" that makes it difficult to know what folder is what.
Some research (googling) on the topic resulted in a very nice code to add IMAP4 UTF-7 encoding capabilities to Python [3]. So with a little time and some copy/paste skills I added this code to offlineimap and ban! I got Japanese working correctly in my folders.
Fixing unicode issues in offlineimap
First we must add the code to convert from IMAP4 UTF7 encoding support to offlineimap. Thanks to Dominic LoBue (offlineimap developer) I learned that we can add our own python code to offlineimap. To do this we created a simple python script (e.g. ~/.utf7.py) that contains the following code (copy/paste from [3]:
# -*- coding: latin-1 -*-
"""
Imap folder names are encoded using a special version of utf-7 as defined in RFC
2060 section 5.1.3.
5.1.3. Mailbox International Naming Convention
By convention, international mailbox names are specified using a
modified version of the UTF-7 encoding described in [UTF-7]. The
purpose of these modifications is to correct the following problems
with UTF-7:
1) UTF-7 uses the "+" character for shifting; this conflicts with
the common use of "+" in mailbox names, in particular USENET
newsgroup names.
2) UTF-7's encoding is BASE64 which uses the "/" character; this
conflicts with the use of "/" as a popular hierarchy delimiter.
3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
the use of "\" as a popular hierarchy delimiter.
4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
the use of "~" in some servers as a home directory indicator.
5) UTF-7 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be
represented in encoded form.
In modified UTF-7, printable US-ASCII characters except for "&"
represent themselves; that is, characters with octet values 0x20-0x25
and 0x27-0x7e. The character "&" (0x26) is represented by the two-
octet sequence "&-".
All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
Unicode 16-bit octets) are represented in modified BASE64, with a
further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII
character which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to US-
ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "-
").
For example, here is a mailbox name which mixes English, Japanese,
and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-
"""
import binascii
import codecs
# encoding
def modified_base64(s):
s = s.encode('utf-16be')
return binascii.b2a_base64(s).rstrip('\n=').replace('/', ',')
def doB64(_in, r):
if _in:
r.append('&%s-' % modified_base64(''.join(_in)))
del _in[:]
def encoder(s):
r = []
_in = []
for c in s:
ordC = ord(c)
if 0x20 <= ordC <= 0x25 or 0x27 <= ordC <= 0x7e:
doB64(_in, r)
r.append(c)
elif c == '&':
doB64(_in, r)
r.append('&-')
else:
_in.append(c)
doB64(_in, r)
return (str(''.join(r)), len(s))
# decoding
def modified_unbase64(s):
b = binascii.a2b_base64(s.replace(',', '/') + '===')
return unicode(b, 'utf-16be')
def decoder(s):
r = []
decode = []
for c in s:
if c == '&' and not decode:
decode.append('&')
elif c == '-' and decode:
if len(decode) == 1:
r.append('&')
else:
r.append(modified_unbase64(''.join(decode[1:])))
decode = []
elif decode:
decode.append(c)
else:
r.append(c)
if decode:
r.append(modified_unbase64(''.join(decode[1:])))
bin_str = ''.join(r)
return (bin_str, len(s))
class StreamReader(codecs.StreamReader):
def decode(self, s, errors='strict'):
return decoder(s)
class StreamWriter(codecs.StreamWriter):
def decode(self, s, errors='strict'):
return encoder(s)
def imap4_utf_7(name):
if name == 'imap4-utf-7':
return (encoder, decoder, StreamReader, StreamWriter)
codecs.register(imap4_utf_7)
Then in our offlineimap configuration file, in the "[general]" section we add a line to load this python script like:
pythonfile = ~/.utf7.py
Then in our remote repository configuration add a nametrans option to convert all foldernames from "imap4-utf-7" encoding to your encoding of preference. My Ubuntu installation is all UTF-8 so I convert all folder names to this encoding:
nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')
The "imap4-utf-7" encoding is added by our "utf7.py" script. With this configuration I can now see the correct Japanese names for all the folders (labels) in my GMail accounts inside Mutt [4].
fix backspace issue in sup/debian
Create a file, /etc/inputrc for system wide use or ~/.inputrc for personal use. Actually, this is the readline initialization file, readline is a library that some programs (bash, kvt) use to read input (try bind -v to see a list of readline key and function bindings). Cut and paste the following in the file to make the Delete key delete characters under the cursor, and make Home and End work as well:
"\e[3~": delete-char
# this is actually equivalent to "\C-?": delete-char
# VT
"\e[1~": beginning-of-line
"\e[4~": end-of-line
# kvt
"\e[H":beginning-of-line
"\e[F":end-of-line
# rxvt and konsole (i.e. the KDE-app...)
"\e[7~":beginning-of-line
"\e[8~":end-of-line
If a system-wide /etc/inputrc was created, add the following line to /etc/profile:
export INPUTRC=/etc/inputrc
Make sure that the stty erase character is set to ^?. Type
stty -a | grep erase
and check if it says
erase = ^?;
If it is set to something else (e.g. ^H) then put the following line in both .bashrc and in either .bash_profile or /etc/profile:
if tty --quiet ; then
stty erase '^?'
fi
and for xterm and rxvt add this to .Xdefaults:
*ttyModes: erase ^?
            
If you create /etc/inputrc, note that Bash will ignore ~/.inputrc (currently this happens in all distributions except Debian, however, this might change in the future). As an alternative, you can edit ~/.inputrc, and copy this to /etc/skel/, so it's in the home directories of all new users.
Push the key-combination 'Ctrl-x-r' (push the control-key, the x-key. release it, push the r-key, release it, and then release the control-key) to see if the changes in inputrc take effect. Or just login again, and it will work.
You can also change the keybindings on the fly with the bind command, e.g:
[localhost]> bind "\C-?": backward-delete-char
This is useful to test different keybindings, if they work you can put them in ~/.inputrc. Read all about it in the readline manpage.
People using keymaps with e.g. Scandinavian characters who would like bash to display these characters (øl;-) have to add the following lines in .inputrc:
set convert-meta off
set output-meta on
set input-meta on
For more info, check the Danish-HOWTO.
"\e[3~": delete-char
# this is actually equivalent to "\C-?": delete-char
# VT
"\e[1~": beginning-of-line
"\e[4~": end-of-line
# kvt
"\e[H":beginning-of-line
"\e[F":end-of-line
# rxvt and konsole (i.e. the KDE-app...)
"\e[7~":beginning-of-line
"\e[8~":end-of-line
If a system-wide /etc/inputrc was created, add the following line to /etc/profile:
export INPUTRC=/etc/inputrc
Make sure that the stty erase character is set to ^?. Type
stty -a | grep erase
and check if it says
erase = ^?;
If it is set to something else (e.g. ^H) then put the following line in both .bashrc and in either .bash_profile or /etc/profile:
if tty --quiet ; then
stty erase '^?'
fi
and for xterm and rxvt add this to .Xdefaults:
*ttyModes: erase ^?
If you create /etc/inputrc, note that Bash will ignore ~/.inputrc (currently this happens in all distributions except Debian, however, this might change in the future). As an alternative, you can edit ~/.inputrc, and copy this to /etc/skel/, so it's in the home directories of all new users.
Push the key-combination 'Ctrl-x-r' (push the control-key, the x-key. release it, push the r-key, release it, and then release the control-key) to see if the changes in inputrc take effect. Or just login again, and it will work.
You can also change the keybindings on the fly with the bind command, e.g:
[localhost]> bind "\C-?": backward-delete-char
This is useful to test different keybindings, if they work you can put them in ~/.inputrc. Read all about it in the readline manpage.
People using keymaps with e.g. Scandinavian characters who would like bash to display these characters (øl;-) have to add the following lines in .inputrc:
set convert-meta off
set output-meta on
set input-meta on
For more info, check the Danish-HOWTO.
add new disk to the system
fdisk -l
\\output - list of /dev/xxx
cfisk /dev/xxx
\\new, write
mkfs -t ext3 /dev/xx
\\output - list of /dev/xxx
cfisk /dev/xxx
\\new, write
mkfs -t ext3 /dev/xx
install debian package and it's dependencies
go to folder where you have downloaded the *.deb and the dependecies and run: sudo dpkg -i --force-depends *.deb
install debian package and it's dependencies
go to folder where you have downloaded the *.deb and the dependecies and run: sudo dpkg -i --force-depends *.deb
November 1, 2011
Subscribe to:
Comments (Atom)
How to type letters with accent
Very useful article https://www.freecodecamp.org/news/how-to-type-letters-with-accents-on-mac/
- 
Для того чтобы изменить сервер обновлений или подключить локальную папку зеркала необходимо - открыть реестр и в соответствующих ветках пр...
 - 
replaced ukrainian "І" with ukrainian "И" https://drive.google.com/drive/folders/1G034a_BPxs0YVFWivjGZjeAyj-jefYNJ if yo...
 - 
plugins are for eap are missing: apt-get install strongswan-plugin-eap-mschapv2