summaryrefslogtreecommitdiff
path: root/basic_python/io_files_parsing.rst
diff options
context:
space:
mode:
Diffstat (limited to 'basic_python/io_files_parsing.rst')
-rw-r--r--basic_python/io_files_parsing.rst386
1 files changed, 0 insertions, 386 deletions
diff --git a/basic_python/io_files_parsing.rst b/basic_python/io_files_parsing.rst
deleted file mode 100644
index 6bbc2e4..0000000
--- a/basic_python/io_files_parsing.rst
+++ /dev/null
@@ -1,386 +0,0 @@
-I/O
-===
-
-Input and Output are used in almost every program, we write. We shall now
-learn how to
-
- * Output data
- * Take input from the user
-
-Let's start with printing a string.
-
-::
-
- a = "This is a string"
- a
- print a
-
-
-``print a``, obviously, is printing the value of ``a``.
-
-As you can see, even when you type just ``a``, the value of ``a`` is shown.
-But there is a difference.
-
-Typing ``a`` shows the value of ``a`` while ``print a`` prints the string.
-This difference becomes more evident when we use strings with newlines in
-them.
-
-::
-
- b = "A line \n New line"
- b
- print b
-
-As you can see, just typing ``b`` shows that ``b`` contains a newline
-character. While typing ``print b`` prints the string and hence the newline.
-
-Moreover when we type just ``a``, the value ``a`` is shown only in
-interactive mode and does not have any effect on the program while running it
-as a script.
-
-We shall look at different ways of outputting the data.
-
-``print`` statement in Python supports string formatting. Various arguments
-can be passed to print using modifiers.
-
-::
-
- x = 1.5
- y = 2
- z = "zed"
- print "x is %2.1f y is %d z is %s" %(x, y, z)
-
-As you can see, the values of x, y and z are substituted in place of
-``%2.1f``, ``%d`` and ``%s`` respectively.
-
-We can also see that ``print`` statement prints a new line character
-at the end of the line, everytime it is called. This can be suppressed
-by using a "," at the end ``print`` statement.
-
-Let us see this by typing out following code on an editor as ``print_example.py``
-
-Open an editor, like ``scite``, ``emacs`` or ``vim`` and type the following.
-
-::
-
- print "Hello"
- print "World"
-
- print "Hello",
- print "World"
-
-Now we run the script using ``%run /home/fossee/print_example.py`` in the
-interpreter. As we can see, the print statement when used with comma in the
-end, prints a space instead of a new line.
-
-Note that Python files are saved with an extension ``.py``.
-
-Now we shall look at taking input from the user. We will use the
-``raw_input`` for this.
-
-Let's type
-
-::
-
- ip = raw_input()
-
-The cursor is blinking indicating that it is waiting for input. We now type
-some random input,
-
-::
-
- an input
-
-and hit enter.
-
-Now let us see what is the value of ip by typing.
-
-::
-
- ip
-
-We can see that it contains the string "an input"
-
-Note that raw_input always gives us a string. For example
-
-
-::
-
- c = raw_input()
- 5.6
- c
-
-Now let us see the type of c.
-
-::
-
- type(c)
-
-We see that c is a string. This implies that anything you enter as input,
-will be taken as a string no matter what you enter.
-
-``raw_input`` can also display a prompt to assist the user.
-
-::
-
- name = raw_input("Please enter your name: ")
-
-prints the string given as argument and then waits for the user input.
-
-Files
-=====
-
-We shall, now, learn to read files, and do some basic actions on the file,
-like opening and reading a file, closing a file, iterating through the file
-line-by-line, and appending the lines of a file to a list.
-
-Let us first open the file, ``pendulum.txt`` present in ``/home/fossee/``.
-The file can be opened using either the absolute path or relative path. In
-all of these examples we shall assume that our present working directory is
-``/home/fossee/`` and hence we only need to specify the file name. To check
-the present working directory, we can use the ``pwd`` command and to change
-our working directory we can use the ``cd`` command.
-
-::
-
- pwd
- cd /home/fossee
-
-Now, to open the file
-
-::
-
- f = open('pendulum.txt')
-
-``f`` is called a file object. Let us type ``f`` on the terminal to
-see what it is.
-
-::
-
- f
-
-The file object shows, the file which is open and the mode (read or write) in
-which it is open. Notice that it is open in read only mode, here.
-
-We shall first learn to read the whole file into a single variable. Later, we
-shall look at reading it line-by-line. We use the ``read`` method of ``f`` to
-read, all the contents of the file into the variable ``pend``.
-
-::
-
- pend = f.read()
-
-Now, let us see what is in ``pend``, by typing
-
-::
-
- print pend
-
-We can see that ``pend`` has all the data of the file. Type just ``pend`` to
-see more explicitly, what it contains.
-
-::
-
- pend
-
-We can split the variable ``pend`` into a list, ``pend_list``, of the lines
-in the file.
-
-::
-
- pend_list = pend.splitlines()
-
- pend_list
-
-Now, let us learn to read the file line-by-line. But, before that we will
-have to close the file, since the file has already been read till the end.
-
-Let us close the file opened into f.
-
-::
-
- f.close()
-
-Let us again type ``f`` on the prompt to see what it shows.
-
-::
-
- f
-
-Notice, that it now says the file has been closed. It is a good programming
-practice to close any file objects that we have opened, after their job is
-done.
-
-Let us, now move on to reading files line-by-line.
-
-To read the file line-by-line, we iterate over the file object line-by-line,
-using the ``for`` command. Let us iterate over the file line-wise and print
-each of the lines.
-
-::
-
- for line in open('pendulum.txt'):
- print line
-
-As we already know, ``line`` is a dummy variable, sometimes called the loop
-variable, and it is not a keyword. We could have used any other variable
-name, but ``line`` seems meaningful enough.
-
-Instead of just printing the lines, let us append them to a list,
-``line_list``. We first initialize an empty list, ``line_list``.
-
-::
-
- line_list = [ ]
-
-Let us then read the file line-by-line and then append each of the lines, to
-the list. We could, as usual close the file using ``f.close`` and re-open it.
-But, this time, let's leave alone the file object ``f`` and directly open the
-file within the for statement. This will save us the trouble of closing the
-file, each time we open it.
-
-::
-
- for line in open('pendulum.txt'):
- line_list.append(line)
-
-Let us see what ``line_list`` contains.
-
-::
-
- line_list
-
-Notice that ``line_list`` is a list of the lines in the file, along with the
-newline characters. If you noticed, ``pend_list`` did not contain the newline
-characters, because the string ``pend`` was split on the newline characters.
-
-Let us now look at how to parse data, learn some string operations to parse
-files and get data out of them, and data-type conversions.
-
-We have a file containing a huge number of records. Each record corresponds
-to the information of a student.
-
-::
-
- A;010002;ANAND R;058;037;42;35;40;212;P;;
-
-
-Each record consists of fields seperated by a ";". The first record is region
-code, then roll number, then name, marks of second language, first language,
-maths, science and social, total marks, pass/fail indicatd by P or F and
-finally W if withheld and empty otherwise.
-
-Our job is to calculate the arithmetic mean of all the maths marks in the
-region B.
-
-Now what is parsing data.
-
-From the input file, we can see that the data we have is in the form of text.
-Parsing this data is all about reading it and converting it into a form which
-can be used for computations -- in our case, sequence of numbers.
-
-Let us learn about tokenizing strings or splitting a string into smaller
-units or tokens. Let us define a string first.
-
-::
-
- line = "parse this string"
-
-We are now going to split this string on whitespace.
-
-::
-
- line.split()
-
-As you can see, we get a list of strings. Which means, when ``split`` is
-called without any arguments, it splits on whitespace. In simple words, all
-the spaces are treated as one big space.
-
-``split`` also can split on a string of our choice. This is acheived by
-passing that as an argument. But first lets define a sample record from the
-file.
-
-::
-
- record = "A;015163;JOSEPH RAJ S;083;042;47;AA;72;244;;;"
- record.split(';')
-
-We can see that the string is split on ';' and we get each field seperately.
-We can also observe that an empty string appears in the list since there are
-two semi colons without anything in between.
-
-To recap, ``split`` splits on whitespace if called without an argument and
-splits on the given argument if it is called with an argument.
-
-Now that we know how to split a string, we can split the record and retrieve
-each field seperately. But there is one problem. The region code "B" and a
-"B" surrounded by whitespace are treated as two different regions. We must
-find a way to remove all the whitespace around a string so that "B" and a "B"
-with white spaces are dealt as same.
-
-This is possible by using the ``strip`` method of strings. Let us define a
-string,
-
-::
-
- word = " B "
- word.strip()
-
-We can see that strip removes all the whitespace around the sentence.
-
-The splitting and stripping operations are done on a string and their result
-is also a string. Hence the marks that we have are still strings and
-mathematical operations are not possible on them. We must convert them into
-numbers (integers or floats), before we can perform mathematical operations
-on them.
-
-We have seen that, it is possible to convert float into integers using
-``int``. We shall now convert strings into floats.
-
-::
-
- mark_str = "1.25"
- mark = float(mark_str)
- type(mark_str)
- type(mark)
-
-We can see that string, ``mark_str`` is converted to a ``float``. We can
-perform mathematical operations on them now.
-
-Now that we have all the machinery required to parse the file, let us solve
-the problem. We first read the file line by line and parse each record. We
-see if the region code is B and store the marks accordingly.
-
-::
-
- math_B = [] # an empty list to store the marks
- for line in open("sslc1.txt"):
- fields = line.split(";")
-
- reg_code = fields[0]
- reg_code_clean = reg_code.strip()
-
- math_mark_str = fields[5]
- math_mark = float(math_mark_str)
-
- if reg_code == "B":
- math_B.append(math_mark)
-
-
-Now we have all the maths marks of region "B" in the list math_marks_B. To
-get the mean, we just have to sum the marks and divide by the length.
-
-::
-
- math_B_mean = sum(math_B) / len(math_B)
- math_B_mean
-
-..
- Local Variables:
- mode: rst
- indent-tabs-mode: nil
- sentence-end-double-space: nil
- fill-column: 77
- End:
-
-