UTF-8 codes in Terminal

Hi friends,

Last week is fantastic for me, hope you also enjoyed your previous week. Sometimes I realize that, reading ‘man’ pages and html pages in ‘/usr/share/doc’ gives us more information, that we never get from ‘google’. Last Saturday, I read ‘utf8’, ‘Unicode’ and ‘console_codes’ man pages one more time to refresh my mind. Also I came up with two scripts which will do some quick works converting tagged unicodes to ‘utf8’ codes and it will display it in terminal. These script only works if you have ‘/usr/share/i18n/charmaps/UTF-8.gz’ file. Here is the scripts.

unicode2utf8.bash

#!/bin/bash

UTF8FILE="/usr/share/i18n/charmaps/UTF-8.gz"
BUFFER=`cat`
BUFFER=`echo "${BUFFER}" | tr '[a-z]' '[A-Z]'`
BUFFER=`echo "${BUFFER}" | tr '\n' ' '`
BUFFER=`echo "${BUFFER}" | tr '\t' ' '`
BUFFER=`echo "${BUFFER}" | tr -s ' '`
BUFFER=`echo "${BUFFER}" | sed -e 's/>/> /g'`
UTF8BUFFER="" 

for UNICODE in ${BUFFER}
do
	UTF8BUFFER="${UTF8BUFFER}"`gunzip -c "${UTF8FILE}" | 
	grep "${UNICODE}" |
	awk '{print $2;}'` 
done

echo -e "\x1b%G${UTF8BUFFER//\//\\}\x1b%@"

This script will take taged Unicode as standard input and display the resolved glyph in standard output. Here is an example screenshot.

unicode2utf8.bash script output
unicode2utf8.bash script output

unicodes.bash

#!/bin/bash
UTF8FILE="/usr/share/i18n/charmaps/UTF-8.gz"

for LANGUAGE
do
	LANGUAGE=`echo "${LANGUAGE}" | tr '[a-z]' '[A-Z]'`

	gunzip -c "${UTF8FILE}" |
	awk "\$3 ~ /${LANGUAGE}/{print \$0;}" | 
	(
		while read UNICODE UTF8CODE DESCRIPTION
		do
			echo -n -e "${UNICODE}\t"
			echo -n -e "\x1b%G${UTF8CODE//\//\\}\x1b%@"
			echo -e "\t${DESCRIPTION}"
		done
	)
done

This script will be very interesting to you. If you give a grep pattern matching your language, say ‘tam’ for tamil, this script will fetch all the unicode details about the whole language. Take a look at the screenshot.

unicodes.bash script output
unicodes.bash script output

I actually intend to write my own algorithm to convert unicode to utf8, but I’m started learning one art called ‘don’t invent the wheel again’. So I used that file to convert unicode to utf8.

And one more thing, there is a quick way you can type your language characters in console using ‘CTRL+SHIFT+U’ then giving unicode. For example,

‘CTRL+SHIFT+U0B85’ will display ‘அ’ in console.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s