.. Objectives .. ---------- .. At the end of this tutorial, you will be able to: .. 1. Sort lines of text files .. 2. Print lines matching a pattern .. 3. Translate or delete characters .. 4. Omit repeated lines .. Prerequisites .. ------------- .. 1. Getting started with Linux .. 2. Redirection and Piping Script ------ +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show the first slide containing title, name of the production | Hello friends and Welcome to the tutorial on 'Text Processing'. | | team along with the logo of MHRD }}} | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show slide with objectives }}} | At the end of this tutorial, you will be able to, | | | | | | 1. Sort lines of text files | | | #. Print lines matching a pattern | | | #. Translate or delete characters | | | #. Omit repeated lines. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Switch to the pre-requisite slide }}} | Before beginning this tutorial,we would suggest you to complete the | | | former tutorials as being displayed currently. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Open the terminal }}} | In this tutorial, we shall learn about text processing. | | :: | TO begin with, consider data kept in two files, namely marks1.txt and | | | students.txt | | cat marks1.txt | Let us see what data they contain. Open a terminal and type, | | cat students.txt | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Let's say we wish to sort the output in the alphabetical order | | | of the names of the files. We can use the ``sort`` command for this | | cut -d " " -f 2- marks1.txt | paste -d " " students.txt -| sort | purpose. | | | | | | We just pipe the previous output to the ``sort`` command as, | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Let's say we wish to sort the names, based on the marks in the first | | | subject i.e. the first column after the name. ``sort`` command also allows us to | | cut -d " " -f 2- marks1.txt | paste -d " " students.txt -| sort -t " " -k 2 | specify the delimiter between the fields and sort the data on a particular | | | field. ``-t`` option is used to specify the delimiter and ``-k`` option | | | is used to specify the field. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show slide with, Sort... }}} | This command give us a sorted output as required. But, what if we would | | | like the output to appear in the reverse order. ``-r`` option allows the output | | | to be sorted in the reverse order and the ``-n`` option is used to choose | | | a numerical sorting. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Switch to the terminal }}} | Let us do it on the terminal and see for ourselves, | | :: | | | | | | cut -d " " -f 2- marks1.txt | paste -d " " students.txt -| | | | sort -t " " -k 2 -rn | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Suppose, While you are compiling the student marklist, Anne walks up to you and | | | wants to know her marks. You, being a kind person that you are, oblige. | | cut -d " " -f 2- marks1.txt | paste -d " " students.txt - | grep Anne | But you do not wish to her to see the marks that others have scored. What | | | do you do? Here, the ``grep`` command comes to your rescue. | | | | | | ``grep`` is a command line text search utility. You can use it to search | | | for Anne and show her, what she scored. ``grep`` allows us to search for a | | | search string in files. But we could, like any other command, pipe the | | | output of other commands to it. So, we shall use the previous combination | | | of cut and paste that we had, to get the marks of students along with their | | | names and search for Anne in that. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | This will give us only the line containing the word Anne as the output. | | | The grep command is by default case-sensitive. So, we wouldn't have got | | cut -d " " -f 2- marks1.txt | paste -d " " students.txt - | grep -i Anne | the result if we had searched for anne, with a small a, instead of | | | Anne, with a capital a. But, what if we didn't know, whether the name was | | | capitalized or not? ``grep`` allows you to do case-insensitive searches | | | by using the ``-i`` option. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Now, in another scenario, if we wished to print all the lines, which do | | | not contain the word Anne, we could use the ``-v`` option. | | cut -d " " -f 2- marks1.txt | paste -d " " students.txt - | grep -iv Anne | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Switch to the terminal }}} | grep allows us to do more complex searches, for instance, searching for | | :: | sentences starting or ending with a particular pattern and regular | | | expression based searches. | | cat students.txt | tr a-z A-Z | | | | {{{ Show slide with, tr }}} | | | | | | ``tr`` is a command that takes two sets of characters as parameters, and | | | replaces occurrences of the characters in the first set with the | | | corresponding elements from the other set. It reads from the standard | | | output and writes to the standard output. | | | | | | For instance, if we wish to replace all the lower case letters in the | | | students file with upper case, we can do it as, | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | A common task is to remove empty newlines from a file. The ``-s`` flag | | | causes ``tr`` to compress sequences of identical adjacent characters in its | | tr -s '\n' '\n' | output to a single token. For example, | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Hit enter 2-3 times and see that every time we hit enter we get a newline. | | | | | | | | | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | It replaces sequences of one or more newline characters with a single newline. | | | | | cat foo.txt | tr -d '\r' > bar.txt | The ``-d`` flag causes ``tr`` to delete all tokens of the specified set of | | | characters from its input. In this case, only a single character set | | | argument is used. The following command removes carriage return characters, | | | thereby converting a file in DOS/Windows format to the Unix format. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | The ``-c`` flag complements the first set of characters. | | | | | tr -cd '[:alnum:]' | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | It therefore removes all non-alphanumeric characters. | | | | | cat items.txt | Let us consider one more scenario.Suppose we have a list of items, say books, | | | and we wish to obtain a list which names of all the books only once, without | | | any duplicates. To achieve this, we use the ``uniq`` command. Let us first | | | have a look at our file | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Now, let us try and get rid of the duplicate lines from this file using | | | the ``uniq`` command. | | uniq items.txt | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | Nothing happens! Why? The ``uniq`` command removes duplicate lines only when | | | they are next to each other. So, henceforth, we get a sorted file from the | | sort items.txt | uniq | original file and work with that file. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | ``uniq -u`` command gives the lines which are unique and do not have any | | | duplicates in the file. ``uniq -d`` outputs only those lines which | | uniq -u items-sorted.txt | have duplicates. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | :: | The ``-c`` option displays the number of times each line occurs in the file. | | | | | uniq -dc items-sorted.txt | | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show summary slide }}} | This brings us to the end of the end of this tutorial. | | | In this tutorial, we have learnt to, | | | | | | 1. Use the ``sort`` command to sort lines of text files. | | | #. Use the ``grep`` command to search text pattern. | | | #. Use the ``tr`` command to translate and/or delete characters. | | | #. Use the ``uniq`` command to omit repeated lines in a text. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show self assessment questions slide }}} | Here are some self assessment questions for you to solve | | | | | | 1. To obtain patterns; one per line, which of the following command is used ? | | | | | | - grep -f | | | - grep -i | | | - grep -v | | | - grep -e | | | | | | 2. Translate the word 'linux' to upper-case. | | | | | | 3. Sort the output of the ``ls -al`` command. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Solution of self assessment questions on slide }}} | And the answers, | | | | | | 1. In order to obtain patterns one per line, we use the ``grep`` command | | | alongwith the -f option. | | | | | | 2. We use the tr command to change the word into uppercase | | | :: | | | | | | echo 'linux' | tr a-z A-Z | | | | | | | | | 3. We use the sort command as, | | | :: | | | | | | ls -al | sort -n -k5 | | | The -n means "sort numerically", and the -k5 option means to key off of | | | column five. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show the SDES & FOSSEE slide }}} | Software Development techniques for Engineers and Scientists - SDES, is an | | | initiative by FOSSEE. For more information, please visit the given link. | | | | | | Free and Open-source Software for Science and Engineering Education - FOSSEE, is | | | based at IIT Bombay which is funded by MHRD as part of National Mission on | | | Education through ICT. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show the ``About the Spoken Tutorial Project'' slide }}} | Watch the video available at the following link. It summarises the Spoken | | | Tutorial project.If you do not have good bandwidth, you can download and | | | watch it. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show the `` Spoken Tutorial Workshops'' slide }}} | The Spoken Tutorial Project Team conducts workshops using spoken tutorials, | | | gives certificates to those who pass an online test. | | | | | | For more details, contact contact@spoken-tutorial.org | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show the ``Acknowledgements'' slide }}} | Spoken Tutorial Project is a part of the "Talk to a Teacher" project. | | | It is supported by the National Mission on Education through ICT, MHRD, | | | Government of India. More information on this mission is available at the | | | given link. | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ | {{{ Show the Thank you slide }}} | Hope you have enjoyed this tutorial and found it useful. | | | Thank you! | +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+