summaryrefslogtreecommitdiff
path: root/basic_python/strings_dicts.rst
blob: b508a3d17b7563a3460535ff6a285adacc4ab137 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
=======
Strings
=======

Strings were briefly introduced previously in the introduction document. In this
section strings will be presented in greater detail. All the standard operations 
that can be performed on sequences such as indexing, slicing, multiplication, length
minimum and maximum can be performed on string variables as well. One thing to
be noted is that strings are immutable, which means that string variables are
unchangeable. Hence, all item and slice assignments on strings are illegal.
Let us look at a few example.

::

  >>> name = 'PythonFreak'
  >>> print name[3]
  h
  >>> print name[-1]
  k
  >>> print name[6:]
  Freak
  >>> name[6:0] = 'Maniac'
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: 'str' object does not support item assignment

This is quite expected, since string objects are immutable as already mentioned.
The error message is clear in mentioning that 'str' object does not support item
assignment.

String Formatting
=================

String formatting can be performed using the string formatting operator represented
as the percent (%) sign. The string placed before the % sign is formatted with 
the value placed to the right of it. Let us look at a simple example.

::

  >>> format = 'Hello %s, from PythonFreak'
  >>> str1 = 'world!'
  >>> print format % str1
  Hello world!, from PythonFreak

The %s parts of the format string are called the coversion specifiers. The coversion
specifiers mark the places where the formatting has to be performed in a string. 
In the example the %s is replaced by the value of str1. More than one value can 
also be formatted at a time by specifying the values to be formatted using tuples
and dictionaries (explained in later sections). Let us look at an example.

::

  >>> format = 'Hello %s, from %s'
  >>> values = ('world!', 'PythonFreak')
  >>> print format % values
  Hello world!, from PythonFreak

In this example it can be observed that the format string contains two conversion 
specifiers and they are formatted using the tuple of values as shown.

The s in %s specifies that the value to be replaced is of type string. Values of 
other types can be specified as well such as integers and floats. Integers are 
specified as %d and floats as %f. The precision with which the integer or the 
float values are to be represented can also be specified using a **.** (**dot**)
followed by the precision value.

String Methods
==============

Similar to list methods, strings also have a rich set of methods to perform various
operations on strings. Some of the most important and popular ones are presented
in this section.

**find**
~~~~~~~~

The **find** method is used to search for a substring within a given string. It 
returns the left most index of the first occurence of the substring. If the 
substring is not found in the string then it returns -1. Let us look at a few 
examples.

::

  >>> longstring = 'Hello world!, from PythonFreak'
  >>> longstring.find('Python')
  19
  >>> longstring.find('Perl')
  -1

**join**
~~~~~~~~

The **join** method is used to join the elements of a sequence. The sequence 
elements that are to be join ed should all be strings. Let us look at a few 
examples.

::
  
  >>> seq = ['With', 'great', 'power', 'comes', 'great', 'responsibility']
  >>> sep = ' '
  >>> sep.join(seq)
  'With great power comes great responsibility'
  >>> sep = ',!'
  >>> sep.join(seq)
  'With,!great,!power,!comes,!great,!responsibility'

*Try this yourself*

::

  >>> seq = [12,34,56,78]
  >>> sep.join(seq)

**lower**
~~~~~~~~~

The **lower** method, as the name indicates, converts the entire text of a string
to lower case. It is specially useful in cases where the programmers deal with case
insensitive data. Let us look at a few examples.

::

  >>> sometext = 'Hello world!, from PythonFreak'
  >>> sometext.lower()
  'hello world!, from pythonfreak'

**replace**
~~~~~~~~~~~

The **replace** method replaces a substring with another substring within
a given string and returns the new string. Let us look at an example.

::

  >>> sometext = 'Concise, precise and criticise is some of the words that end with ise'
  >>> sometext.replace('is', 'are')
  'Concaree, precaree and criticaree are some of the words that end with aree'

Observe here that all the occurences of the substring *is* have been replaced,
even the *is* in *concise*, *precise* and *criticise* have been replaced.

**split**
~~~~~~~~~

The **split** is one of the very important string methods. split is the opposite of the 
**join** method. It is used to split a string based on the argument passed as the
delimiter. It returns a list of strings. By default when no argument is passed it
splits with *space* (' ') as the delimiter. Let us look at an example.

::

  >>> grocerylist = 'butter, cucumber, beer(a grocery item??), wheatbread'
  >>> grocerylist.split(',')
  ['butter', ' cucumber', ' beer(a grocery item??)', ' wheatbread']
  >>> grocerylist.split()
  ['butter,', 'cucumber,', 'beer(a', 'grocery', 'item??),', 'wheatbread']

Observe here that in the second case when the delimiter argument was not set 
**split** was done with *space* as the delimiter.

**strip**
~~~~~~~~~

The **strip** method is used to remove or **strip** off any whitespaces that exist
to the left and right of a string, but not the whitespaces within a string. Let 
us look at an example.

::

  >>> spacedtext = "               Where's the text??                 "
  >>> spacedtext.strip()
  "Where's the text??"

Observe that the whitespaces between the words have not been removed.

::

  Note: Very important thing to note is that all the methods shown above do not
        transform the source string. The source string still remains the same.
	Remember that **strings are immutable**.

Introduction to the standard library
====================================

Python is often referred to as a "Batteries included!" language, mainly because 
of the Python Standard Library. The Python Standard Library provides an extensive
set of features some of which are available directly for use while some require to
import a few **modules**. The Standard Library provides various built-in functions
like:

    * **abs()**
    * **dict()**
    * **enumerate()**

The built-in constants like **True** and **False** are provided by the Standard Library.
More information about the Python Standard Library is available http://docs.python.org/library/


I/O: Reading and Writing Files
==============================

Files are very important aspects when it comes to computing and programming.
Up until now the focus has been on small programs that interacted with users
through **input()** and **raw_input()**. Generally, for computational purposes
it becomes necessary to handle files, which are usually large in size as well.
This section focuses on basics of file handling.

Opening Files
~~~~~~~~~~~~~

Files can be opened using the **open()** method. **open()** accepts 3 arguments
out of which 2 are optional. Let us look at the syntax of **open()**:

*f = open( filename, mode, buffering)*

The *filename* is a compulsory argument while the *mode* and *buffering* are 
optional. The *filename* should be a string and it should be the complete path
to the file to be opened (The path can be absolute or relative). Let us look at
an example.

::

  >>> f = open ('basic_python/interim_assessment.rst')
  
The *mode* argument specifies the mode in which the file has to be opened.
The following are the valid mode arguments:

**r** - Read mode
**w** - Write mode
**a** - Append mode
**b** - Binary mode
**+** - Read/Write mode

The read mode opens the file as a read-only document. The write mode opens the
file in the Write only mode. In the write mode, if the file existed prior to the
opening, the previous contents of the file are erased. The append mode opens the
file in the write mode but the previous contents of the file are not erased and
the current data is appended onto the file.
The binary and the read/write modes are special in the sense that they are added
onto other modes. The read/write mode opens the file in the reading and writing
mode combined. The binary mode can be used to open a files that do not contain 
text. Binary files such as images should be opened in the binary mode. Let us look
at a few examples.

::

  >>> f = open ('basic_python/interim_assessment.rst', 'r')
  >>> f = open ('armstrong.py', 'r+')

The third argument to the **open()** method is the *buffering* argument. This takes
a boolean value, *True* or *1* indicates that buffering has to be enabled on the file,
that is the file is loaded on to the main memory and the changes made to the file are 
not immediately written to the disk. If the *buffering* argument is *0* or *False* the 
changes are directly written on to the disk immediately.

Reading and Writing files
~~~~~~~~~~~~~~~~~~~~~~~~~

**write()**
-----------

**write()**, evidently, is used to write data onto a file. It takes the data to 
be written as the argument. The data can be a string, an integer, a float or any
other datatype. In order to be able to write data onto a file, the file has to
be opened in one of **w**, **a** or **+** modes.

**read()**
----------

**read()** is used to read data from a file. It takes the number of bytes of data
to be read as the argument. If nothing is specified by default it reads the entire 
contents from the current position to the end of file.

Let us look at a few examples:

::

  >>> f = open ('randomtextfile', 'w')
  >>> f.write('Hello all, this is PythonFreak. This is a random text file.')
  >>> f = open ('../randomtextfile', 'r')
  >>> f = open ('../randomtextfile', 'r')
  >>> f.read(5)
  'Hello'
  >>> f.read()
  ' all, this is PythonFreak. This is a random text file.'
  >>> f.close()

**readline()**
--------------

**readline()** is used to read a file line by line. **readline()** reads a line
of a file at a time. When an argument is passed to **readline()** it reads that
many bytes from the current line.

One other method to read a file line by line is using the **read()** and the 
**for** construct. Let us look at this block of code as an example.

::

  >>> f = open('../randomtextfile', 'r')
  >>> for line in f:
  ...     print line
  ... 
  Hello all!
  
  This is PythonFreak on the second line.
  
  This is a random text file on line 3

**close()**
-----------

One must always close all the files that have been opened. Although, files opened
will be closed automatically when the program ends. When files opened in read mode
are not closed it might lead to uselessly locked sometimes. In case of files
opened in the write mode it is more important to close the files. This is because,
Python maybe using the file in the buffering mode and when the file is not closed
the buffer maybe lost completely and the changes made to the file are lost forever.


Dictionaries
============

A dictionary in general, are designed to be able to look up meanings of words. 
Similarly, the Python dictionaries are also designed to look up for a specific
key and retrieve the corresponding value. Dictionaries are data structures that
provide key-value mappings. Dictionaries are similar to lists except that instead
of the values having integer indexes, dictionaries have keys or strings as indexes.
Let us look at an example of how to define dictionaries.

::

  >>> dct = { 'Sachin': 'Tendulkar', 'Rahul': 'Dravid', 'Anil': 'Kumble'}

The dictionary consists of pairs of strings, which are called *keys* and their
corresponding *values* separated by *:* and each of these *key-value* pairs are
comma(',') separated and the entire structure wrapped in a pair curly braces *{}*.

::

  Note: The data inside a dictionary is not ordered. The order in which you enter
  the key-value pairs is not the order in which they are stored in the dictionary.
  Python has an internal storage mechanism for that which is out of the purview 
  of this document.

**dict()**
~~~~~~~~~~

The **dict()** function is used to create dictionaries from other mappings or other
dictionaries. Let us look at an example.

::

  >>> diction = dict(mat = 133, avg = 52.53)

**String Formatting with Dictionaries:**

String formatting was discussed in the previous section and it was mentioned that
dictionaries can also be used for formatting more than one value. This section 
focuses on the formatting of strings using dictionaries. String formatting using
dictionaries is more appealing than doing the same with tuples. Here the *keyword*
can be used as a place holder and the *value* corresponding to it is replaced in
the formatted string. Let us look at an example.

::

  >>> player = { 'Name':'Rahul Dravid', 'Matches':133, 'Avg':52.53, '100s':26 }
  >>> strng = '%(Name)s has played %(Matches)d with an average of %(Avg).2f and has %(100s)d hundreds to his name.'
  >>> print strng % player
  Rahul Dravid has played 133 with an average of 52.53 and has 26 hundreds to his name.

Dictionary Methods
~~~~~~~~~~~~~~~~~~

**clear()**
-----------

The **clear()** method removes all the existing *key-value* pairs from a dictionary.
It returns *None* or rather does not return anything. It is a method that changes
the object. It has to be noted here that dictionaries are not immutable. Let us 
look at an example.

::
  
  >>> dct
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
  >>> dct.clear()
  >>> dct
  {}

**copy()**
----------

The **copy()** returns a copy of a given dictionary. Let us look at an example.

::

  >>> dct = {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
  >>> dctcopy = dct.copy()
  >>> dctcopy
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}


**get()**
---------

**get()** returns the *value* for the *key* passed as the argument and if the
*key* does not exist in the dictionary, it returns *None*. Let us look at an
example.

::

  >>> print dctcopy.get('Saurav')
  None
  >>> print dctcopy.get('Anil')
  Kumble

**has_key()**
-------------

This method returns *True* if the given *key* is in the dictionary, else it returns 
*False*.

::

  >>> dctcopy.has_key('Saurav')
  False
  >>> dctcopy.has_key('Sachin')
  True

**pop()**
---------

This method is used to retrieve the *value* of a given *key* and subsequently 
remove the *key-value* pair from the dictionary. Let us look at an example.

::

  >>> print dctcopy.pop('Sachin')
  Tendulkar
  >>> dctcopy
  {'Anil': 'Kumble', 'Rahul': 'Dravid'}

**popitem()**
-------------

This method randomly pops a *key-value* pair from a dictionary and returns it.
The *key-value* pair returned is removed from the dictionary. Let us look at an
example.

::

  >>> print dctcopy.popitem()
  ('Anil', 'Kumble')
  >>> dctcopy
  {'Rahul': 'Dravid'}

  Note that the item chosen is completely random since dictionaries are unordered
  as mentioned earlier.

**update()**
------------

The **update()** method updates the contents of one dictionary with the contents
of another dictionary. For items with existing *keys* their *values* are updated,
and the rest of the items are added. Let us look at an example.

::

  >>> dctcopy.update(dct)
  >>> dct
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}
  >>> dctcopy
  {'Anil': 'Kumble', 'Sachin': 'Tendulkar', 'Rahul': 'Dravid'}