summaryrefslogtreecommitdiff
path: root/ult/ult_7/script2col.rst
blob: b15e85cdac766e4beaa30149c3f614506c92d8c3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
.. Objectives
.. ----------
   
   .. At the end of this tutorial, you will be able to:
   
   ..   1. Sort lines of text files
   ..   2. Print lines matching a pattern
   ..   3. Translate or delete characters
   ..   4. Omit repeated lines


.. Prerequisites
.. -------------

..   1. Getting started with Linux
..   2. Redirection and Piping


 
Script
------



+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show the  first slide containing title, name of the production               | Hello friends and Welcome to the tutorial on 'Text Processing'.                  |
| team along with the logo of MHRD }}}                                             |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show slide with objectives }}}                                               | At the end of this tutorial, you will be able to,                                |
|                                                                                  |                                                                                  |
|                                                                                  |  1. Sort lines of text files                                                     |
|                                                                                  |  #. Print lines matching a pattern                                               |
|                                                                                  |  #. Translate or delete characters                                               |
|                                                                                  |  #. Omit repeated lines.                                                         |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Switch to the pre-requisite slide }}}                                        | Before beginning this tutorial,we would suggest you to complete the              |
|                                                                                  | former tutorials as being displayed currently.                                   |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Open the terminal }}}                                                        | In this tutorial, we shall learn about text processing.                          |
| ::                                                                               | TO begin with, consider data kept in two files, namely marks1.txt and            |
|                                                                                  | students.txt                                                                     |
|     cat marks1.txt                                                               | Let us see what data they contain. Open a terminal and type,                     |
|     cat students.txt                                                             |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Let's say we wish to sort the output in the alphabetical order                   |
|                                                                                  | of the names of the files. We can use the ``sort`` command for this              |
|     cut -d " " -f 2- marks1.txt | paste -d " " students.txt -| sort              | purpose.                                                                         |
|                                                                                  |                                                                                  |
|                                                                                  | We just pipe the previous output to the ``sort`` command as,                     |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Let's say we wish to sort the names, based on the marks in the first             |
|                                                                                  | subject i.e. the first column after the name. ``sort`` command also allows us to |
|     cut -d " " -f 2- marks1.txt | paste -d " " students.txt -| sort -t " " -k 2  | specify the delimiter between the fields and sort the data on a particular       |
|                                                                                  | field. ``-t`` option is used to specify the delimiter and ``-k`` option          |
|                                                                                  | is used to specify the field.                                                    |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show slide with, Sort... }}}                                                 | This command give us a sorted output as required. But, what if we would          |
|                                                                                  | like the output to appear in the reverse order. ``-r`` option allows the output  |
|                                                                                  | to be sorted in the reverse order and the ``-n`` option is used to choose        |
|                                                                                  | a numerical sorting.                                                             |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Switch to the terminal }}}                                                   | Let us do it on the terminal and see for ourselves,                              |
| ::                                                                               |                                                                                  |
|                                                                                  |                                                                                  |
|     cut -d " " -f 2- marks1.txt | paste -d " " students.txt -|                   |                                                                                  |
|     sort -t " " -k 2 -rn                                                         |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Suppose, While you are compiling the student marklist, Anne walks up to you and  |
|                                                                                  | wants to know her marks. You, being a kind person that you are, oblige.          |
|     cut -d " " -f 2- marks1.txt | paste -d " " students.txt - | grep Anne        | But you do not wish to her to see the marks that others have scored. What        |
|                                                                                  | do you do? Here, the ``grep`` command comes to your rescue.                      |
|                                                                                  |                                                                                  |
|                                                                                  | ``grep`` is a command line text search utility. You can use it to search         |
|                                                                                  | for Anne and show her, what she scored. ``grep`` allows us to search for a       |
|                                                                                  | search string in files. But we could, like any other command, pipe the           |
|                                                                                  | output of other commands to it. So, we shall use the previous combination        |
|                                                                                  | of cut and paste that we had, to get the marks of students along with their      |
|                                                                                  | names and search for Anne in that.                                               |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | This will give us only the line containing the word Anne as the output.          |
|                                                                                  | The grep command is by default case-sensitive. So, we wouldn't have got          |
|     cut -d " " -f 2- marks1.txt | paste -d " " students.txt - | grep -i Anne     | the result if we had searched for anne, with a small a, instead of               |
|                                                                                  | Anne, with a capital a. But, what if we didn't know, whether the name was        |
|                                                                                  | capitalized or not? ``grep`` allows you to do case-insensitive searches          |
|                                                                                  | by using the ``-i`` option.                                                      |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Now, in another scenario, if we wished to print all the lines, which do          |
|                                                                                  | not contain the word Anne, we could use the ``-v`` option.                       |
|     cut -d " " -f 2- marks1.txt | paste -d " " students.txt - | grep -iv Anne    |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Switch to the terminal }}}                                                   | grep allows us to do more complex searches, for instance, searching for          |
| ::                                                                               | sentences starting or ending with a particular pattern and regular               |
|                                                                                  | expression based searches.                                                       |
|     cat students.txt | tr a-z A-Z                                                |                                                                                  |
|                                                                                  | {{{ Show slide with, tr }}}                                                      |
|                                                                                  |                                                                                  |
|                                                                                  | ``tr`` is a command that takes two sets of characters as parameters, and         |
|                                                                                  | replaces occurrences of the characters in the first set with the                 |
|                                                                                  | corresponding elements from the other set. It reads from the standard            |
|                                                                                  | output and writes to the standard output.                                        |
|                                                                                  |                                                                                  |
|                                                                                  | For instance, if we wish to replace all the lower case letters in the            |
|                                                                                  | students file with upper case, we can do it as,                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | A common task is to remove empty newlines from a file. The ``-s`` flag           |
|                                                                                  | causes ``tr`` to compress sequences of identical adjacent characters in its      |
|     tr -s '\n' '\n'                                                              | output to a single token. For example,                                           |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Hit enter 2-3 times and see that every time we hit enter we get a newline.       |
|                                                                                  |                                                                                  |
|     <Enter>                                                                      |                                                                                  |
|     <Enter>                                                                      |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | It replaces sequences of one or more newline characters with a single newline.   |
|                                                                                  |                                                                                  |
|     cat foo.txt | tr -d '\r' > bar.txt                                           | The ``-d`` flag causes ``tr`` to delete all tokens of the specified set of       |
|                                                                                  | characters from its input. In this case, only a single character set             |
|                                                                                  | argument is used. The following command removes carriage return characters,      |
|                                                                                  | thereby converting a file in DOS/Windows format to the Unix format.              |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | The ``-c`` flag complements the first set of characters.                         |
|                                                                                  |                                                                                  |
|     tr -cd '[:alnum:]'                                                           |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | It therefore removes all non-alphanumeric characters.                            |
|                                                                                  |                                                                                  |
|     cat items.txt                                                                | Let us consider one more scenario.Suppose we have a list of items, say books,    |
|                                                                                  | and we wish to obtain a list which names of all the books only once, without     |
|                                                                                  | any duplicates. To achieve this, we use the ``uniq`` command. Let us first       |
|                                                                                  | have a look at our file                                                          |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Now, let us try and get rid of the duplicate lines from this file using          |
|                                                                                  | the ``uniq`` command.                                                            |
|     uniq items.txt                                                               |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | Nothing happens! Why? The ``uniq`` command removes duplicate lines only when     |
|                                                                                  | they are next to each other. So, henceforth, we get a sorted file from the       |
|     sort items.txt | uniq                                                        | original file and work with that file.                                           |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | ``uniq -u`` command gives the lines which are unique and do not have any         |
|                                                                                  | duplicates in the file. ``uniq -d`` outputs only those lines which               |
|     uniq -u items-sorted.txt                                                     | have duplicates.                                                                 |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| ::                                                                               | The ``-c`` option displays the number of times each line occurs in the file.     |
|                                                                                  |                                                                                  |
|     uniq -dc items-sorted.txt                                                    |                                                                                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show summary slide }}}                                                       | This brings us to the end of the end of this tutorial.                           |
|                                                                                  | In this tutorial, we have learnt to,                                             |
|                                                                                  |                                                                                  |
|                                                                                  |   1. Use the ``sort`` command to sort lines of text files.                       |
|                                                                                  |   #. Use the ``grep`` command to search text pattern.                            |
|                                                                                  |   #. Use the ``tr`` command to translate and/or delete characters.               |
|                                                                                  |   #. Use the ``uniq`` command to omit repeated lines in a text.                  |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show self assessment questions slide }}}                                     | Here are some self assessment questions for you to solve                         |
|                                                                                  |                                                                                  |
|                                                                                  | 1. To obtain patterns; one per line, which of the following command is used ?    |
|                                                                                  |                                                                                  |
|                                                                                  |     - grep -f                                                                    |
|                                                                                  |     - grep -i                                                                    |
|                                                                                  |     - grep -v                                                                    |
|                                                                                  |     - grep -e                                                                    |
|                                                                                  |                                                                                  |
|                                                                                  | 2. Translate the word 'linux' to upper-case.                                     |
|                                                                                  |                                                                                  |
|                                                                                  | 3. Sort the output of the ``ls -al`` command.                                    |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Solution of self assessment questions on slide }}}                           | And the answers,                                                                 |
|                                                                                  |                                                                                  |
|                                                                                  | 1. In order to obtain patterns one per line, we use the ``grep`` command         |
|                                                                                  |     alongwith the -f option.                                                     |
|                                                                                  |                                                                                  |
|                                                                                  | 2. We use the tr command to change the word into uppercase                       |
|                                                                                  | ::                                                                               |
|                                                                                  |                                                                                  |
|                                                                                  |     echo 'linux' | tr a-z A-Z                                                    |
|                                                                                  |                                                                                  |
|                                                                                  |                                                                                  |
|                                                                                  | 3. We use the sort command as,                                                   |
|                                                                                  | ::                                                                               |
|                                                                                  |                                                                                  |
|                                                                                  |     ls -al | sort -n -k5                                                         |
|                                                                                  | The -n  means "sort numerically", and the -k5 option means to key off of         |
|                                                                                  | column five.                                                                     |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show the SDES & FOSSEE slide }}}                                             | Software Development techniques for Engineers and Scientists - SDES, is an       |
|                                                                                  | initiative by FOSSEE. For more information, please visit the given link.         |
|                                                                                  |                                                                                  |
|                                                                                  | Free and Open-source Software for Science and Engineering Education - FOSSEE, is |
|                                                                                  | based at IIT Bombay which is funded by MHRD as part of National Mission on       |
|                                                                                  | Education through ICT.                                                           |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show the ``About the Spoken Tutorial Project'' slide }}}                     | Watch the video available at the following link. It summarises the Spoken        |
|                                                                                  | Tutorial project.If you do not have good bandwidth, you can download and         |
|                                                                                  | watch it.                                                                        |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show the `` Spoken Tutorial Workshops'' slide }}}                            | The Spoken Tutorial Project Team conducts workshops using spoken tutorials,      |
|                                                                                  | gives certificates to those who pass an online test.                             |
|                                                                                  |                                                                                  |
|                                                                                  | For more details, contact contact@spoken-tutorial.org                            |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show the ``Acknowledgements'' slide }}}                                      | Spoken Tutorial Project is a part of the "Talk to a Teacher" project.            |
|                                                                                  | It is supported by the National Mission on Education through ICT, MHRD,          |
|                                                                                  | Government of India. More information on this mission is available at the        |
|                                                                                  | given link.                                                                      |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| {{{ Show the Thank you slide }}}                                                 | Hope you have enjoyed this tutorial and found it useful.                         |
|                                                                                  | Thank you!                                                                       |
+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+