Reading Key Value Pair in Python Separated by Space

Ch.6: Dictionaries and strings

Hans Petter Langtangen [i, two] [1] Simula Enquiry Laboratory [2] University of Oslo, Dept. of Information science

Aug fifteen, 2015

Goals

Larn more near file reading
Store file information in a new object blazon: dictionary
Translate content in files via string manipulation

The master focus in the grade is on working with files, dictionaries and strings. The book has boosted cloth on how to utilize data from the Internet.

Dictionaries

figfiles = {'fig1.pdf':          81761,          'fig2.png':          8754}  figfiles['fig3.png'] = os.path.getsize(filename)          for          name          in          figfiles:          print          'File size of %g is %d:'          % (proper noun, figfiles[name])

A dictionary is a generalization of a list

Features of lists:

shop a sequence of elements in a single object ([1,3,-one])
each element is a Python object
the elements are indexed past integers 0, ane, ...

Dictionaries tin index objects in a collection via text
(= "lists with text index")
Lexicon in Python is called hash, HashMap and associative array in other languages

The listing alphabetize is sometimes unnatural for locating an element of a collection of objects

Suppose we need to store the temperatures in Oslo, London and Paris.

List solution:

temps = [xiii,            15.iv,            17.five]            # temps[0]: Oslo            # temps[one]: London            # temps[2]: Paris            print            'The temperature in Oslo is', temps[0]

Can look upward a temperature by mapping metropolis to index to float

But it would be more natural to write temps[Oslo]!

Dictionaries map strings to objects

            # Initialize dictionary            temps = {'Oslo':            xiii,            'London':            15.4,            'Paris':            17.v}            # Applications            print            'The temperature in London is', temps['London']            impress            'The temperature in Oslo is',   temps['Oslo']

Important:

The string index, similar Oslo, is called primal, while temps['Oslo'] is the associated value
A dictionary is an unordered drove of fundamental-value pairs

Initializing dictionaries

2 ways of initializing a collection of key-value pairs:

mydict = {'key1': value1,            'key2': value2, ...}  temps = {'Oslo':            thirteen,            'London':            xv.iv,            'Paris':            17.five}            # or            mydict =            dict(key1=value1, key2=value2, ...)  temps =            dict(Oslo=13, London=fifteen.4, Paris=17.5)

Add a new chemical element to a dict (dict = dictionary):

>>> temps['Madrid'] =            26.0            >>>            print            temps {'Oslo':            xiii,            'London':            15.4,            'Paris':            17.5,            'Madrid':            26.0}

Looping (iterating) over a dict means looping over the keys

            for            key            in            dictionary:     value = dictionary[key]            impress            value

Example:

>>>            for            city            in            temps: ...            impress            'The %s temperature is %g'            % (city, temps[city]) ... The Paris temperature            is            17.5            The Oslo temperature            is            13            The London temperature            is            xv.iv            The Madrid temperature            is            26

Annotation: the sequence of keys is arbitrary! Use sort if yous need a particular sequence:

            for            city            in            sorted(temps):            # alphabetic sort of keys            value = temps[city]            print            value

Tin examination for detail keys, delete elements, etc

Does the dict have a item key?

>>>            if            'Berlin'            in            temps: ...            print            'Berlin:', temps['Berlin'] ...            else: ...            impress            'No temperature data for Berlin'            ... No temperature information            for            Berlin >>>            'Oslo'            in            temps            # standard boolean expression            True

Delete an element of a dict:

>>>            del            temps['Oslo']            # remove Oslo key w/value            >>> temps {'Paris':            17.5,            'London':            xv.4,            'Madrid':            26.0} >>>            len(temps)            # no of key-value pairs in dict.            3

The keys and values can be reached equally lists

Python version two:

>>> temps.keys() ['Paris',            'London',            'Madrid'] >>> temps.values() [17.5,            15.iv,            26.0]

Python version three: temps.keys() and temps.values() are iterators, not lists!

>>>            for            city            in            temps.keys():            # works in Py 2 and 3            >>>            print            urban center ... Paris Madrid London >>> keys_list =            list(temps.keys())            # Py iii: iterator -> list

Caution: ii variables can alter the same dictionary

>>> t1 = temps >>> t1['Stockholm'] =            x.0            # modify t1            >>> temps            # temps is also changed!            {'Stockholm':            10.0,            'Paris':            17.v,            'London':            xv.iv,            'Madrid':            26.0} >>> t2 = temps.copy()            # take a copy            >>> t2['Paris'] =            xvi            >>> t1['Paris']            # t1 was not changed            17.five

Call back the aforementioned for lists:

>>> L = [i,            2,            3] >>> M = 50 >>> M[1] =            8            >>> L[1]            8            >>> M = L[:]            # have copy of Fifty            >>> G[2] =            0            >>> 50[two]            3

Whatever constant object can be used as fundamental

And then far: key is text (string object)
Keys tin be any immutable (constant) object (!)

>>> d = {1:            34,            ii:            67,            3:            0}            # key is int            >>> d = {thirteen:            'Oslo',            15.iv:            'London'}            # possible            >>> d = {(0,0):            4, (i,-1):            5}            # key is tuple            >>> d = {[0,0]:            4, [-1,one]:            v}            # list is mutable/changeable            ...            TypeError: unhashable            type:            'list'

Example: Polynomials represented by dictionaries

The information in the polynomial $$ p(x)=-1 + x^two + 3x^vii $$ can be represented by a dict with power equally key (int) and coefficient as value (float):

p = {0: -ane,            2:            1,            vii:            3.5}

Evaluate such a polynomial $ \sum_{i\in I} c_ix^i $ for some $ ten $:

            def            eval_poly_dict(poly, x):            sum            =            0.0            for            ability            in            poly:            sum            += poly[power]*x**power            render            sum

Brusque pro version:

            def            eval_poly_dict2(poly, x):            # Python's sum tin can add elements of an iterator            return            sum(poly[power]*x**power            for            power            in            poly)

Polynomials can as well be represented past lists

The list alphabetize corresponds to the power, due east.g., the data in $ -1 + ten^2 + 3x^seven $ is represented as

p = [-ane,            0,            1,            0,            0,            0,            0,            3]

The full general polynomial $ \sum_{i=0}^N c_ix^i $ is stored equally [c0, c1, c2, ..., cN].

Evaluate such a polynomial $ \sum_{i=0}^N c_ix^i $ for some $ x $:

            def            eval_poly_list(poly, x):            sum            =            0            for            ability            in            range(len(poly)):            sum            += poly[power]*10**power            return            sum

What is all-time for polynomials: lists or dictionaries?

Dictionaries need but store the nonzero terms. Compare dict vs list for the polynomial $ 1 - x^{200} $:

p = {0:            1,            200: -i}            # len(p) is 2            p = [ane,            0,            0,            0, ...,            200]            # len(p) is 201

Dictionaries can easily handle negative powers, east.thou., $ {1\over2}10^{-iii} + 2x^iv $

p = {-3:            0.5,            4:            2}            print            eval_poly_dict(p, x=4)

Quick recap of file reading

infile  =          open(filename,          'r')          # open up file for reading          line    = infile.readline()          # read the next line          filestr = infile.read()          # read rest of file into string          lines   = infile.readlines()          # read rest of file into list          for          line          in          infile:          # read rest of file line by line          infile.shut()          # recall to close!

Instance: Read file data into a dictionary

Data file:

Oslo:            21.8            London:            18.1            Berlin:            19            Paris:            23            Rome:            26            Helsinki:            17.8

Store in dict, with urban center names as keys and temperatures every bit values

Programme:

infile =            open('deg2.dat',            'r') temps = {}            # start with empty dict            for            line            in            infile.readlines():     city, temp = line.split()     metropolis = city[:-1]            # remove terminal char (:)            temps[urban center]  =            float(temp)

A tabular file can be read into a nested lexicon

Data file table.dat:

            A        B       C      D            1            eleven.seven            0.035            2017            99.1            ii            9.2            0.037            2019            101.2            three            12.2            no       no            105.2            iv            10.1            0.031            no            102.1            5            9.ane            0.033            2009            103.iii            6            viii.7            0.036            2015            101.9

Create a dict data[p][i] (dict of dict) to concur measurement no. i (1, ii, etc.) of belongings p ('A', 'B', etc.)

We must first develop the plan (algorithm) for doing this

Examine the outset line:

dissever it into words
initialize a lexicon with the property names equally keys and empty dictionaries {} as values

For each of the remaining lines:

split line into words
for each word after the first: if word is not no, convert to float and shop

Skillful exercise: practice this now!
(Come across the volume for a complete implementation.)

Example: Download data from the web and visualize

Problem:

Compare the stock prices of Microsoft, Apple, and Google over decades
http://finance.yahoo.com/ offers such data in files with tabular form

Appointment,Open,Loftier,Low,Shut,Book,Adj Close 2014-02-03,502.61,551.19,499.30,545.99,12244400,545.99 2014-01-02,555.68,560.20,493.55,500.lx,15698500,497.62 2013-12-02,558.00,575.14,538.80,561.02,12382100,557.68 2013-11-01,524.02,558.33,512.38,556.07,9898700,552.76 2013-x-01,478.45,539.25,478.28,522.70,12598400,516.57 ... 1984-x-01,25.00,27.37,22.l,24.87,5654600,ii.73 1984-09-07,26.50,29.00,24.62,25.12,5328800,2.76

We demand to analyze the file format to observe the algorithm for interpreting the content

Appointment,Open,High,Low,Close,Volume,Adj Close 2014-02-03,502.61,551.19,499.thirty,545.99,12244400,545.99 2014-01-02,555.68,560.20,493.55,500.sixty,15698500,497.62 2013-12-02,558.00,575.14,538.80,561.02,12382100,557.68 2013-11-01,524.02,558.33,512.38,556.07,9898700,552.76 2013-x-01,478.45,539.25,478.28,522.70,12598400,516.57 ... 1984-10-01,25.00,27.37,22.50,24.87,5654600,2.73 1984-09-07,26.l,29.00,24.62,25.12,5328800,2.76

File format:

Columns are separated by comma
First column is the engagement, the final is the cost of interest
The prizes kickoff at different dates

We need algorithms before we can write code

Algorithm for reading information:

skip first line
read line by line
split each line wrt. comma
shop first give-and-take (appointment) in a listing of dates
store final word (prize) in a listing of prices
collect appointment and price list in a dictionary (cardinal is company)
brand a role for reading 1 company's file

Plotting:

Convert yr-month-day time specifications in strings into year coordinates along the ten axis
Annotation that the companies' price history starts at different years

No code is presented hither...

Run into the book for all details. If you lot understand this quite comprehensive example, you know and understand a lot!

Plot of normalized stock prices in logarithmic scale

Much computer history in this plot:

String manipulation

>>> due south =          'This is a string'          >>> s.split() ['This',          'is',          'a',          'cord'] >>>          'This'          in          due south          True          >>> s.find('is')          4          >>>          ', '.join(south.dissever())          'This, is, a, string'

String manipulation is key to translate the content of files

Text in Python is represented as strings
Inspecting and manipulating strings is the mode we can sympathize the contents of files
Programme: first show bones operations, and then accost real examples

Sample cord used for illustrations:

>>> s =            'Berlin: 18.four C at 4 pm'

Strings behave much similar lists/tuples - they are a sequence of characters:

>>> south[0]            'B'            >>> south[1]            'east'            >>> s[-one]            'm'

Extracting substrings

Substrings are only as slices of lists and arrays:

>>> s            'Berlin: 18.4 C at four pm'            >>> s[8:]            # from alphabetize eight to the end of the cord            'xviii.4 C at 4 pm'            >>> s[8:12]            # index 8, 9, 10 and 11 (not 12!)            '18.4'            >>> s[viii:-one]            'eighteen.4 C at 4 p'            >>> s[viii:-8]            '18.four C'

Find commencement of substring:

>>> s.discover('Berlin')            # where does 'Berlin' start?            0            # at index 0            >>> s.find('pm')            20            >>> s.find('Oslo')            # non found            -1

Checking if a substring is independent in a cord

>>>          'Berlin'          in          southward:          True          >>>          'Oslo'          in          s:          False          >>>          if          'C'          in          s: ...          print          'C found'          ...          else: ...          print          'no C'          ... C found

Substituting a substring by some other string

s.replace(s1, s2): supplant s1 by s2

>>> s.replace(' ',            '__')            'Berlin:__18.4__C__at__4__pm'            >>> south.replace('Berlin',            'Bonn')            'Bonn: 18.4 C at 4 pm'

Example: replace the text before the starting time colon by 'Bonn'

>>> southward            'Berlin: 18.4 C at iv pm'            >>> s.replace(s[:southward.find(':')],            'Bonn')            'Bonn: 18.4 C at iv pm'

1) s.find(':') returns 6, 2) s[:6] is 'Berlin', three) Berlin is replaced past 'Bonn'

Splitting a string into a list of substrings

due south.divide(sep): split up s into a list of substrings separated past sep (no separator implies split wrt whitespace):

>>> s            'Berlin: 18.4 C at four pm'            >>> s.divide(':') ['Berlin',            ' 18.4 C at 4 pm'] >>> s.split() ['Berlin:',            '18.4',            'C',            'at',            '4',            'pm']

Try to empathise this i:

>>> s.carve up(':')[one].split()[0]            '18.4'            >>> deg =            float(_)            # _ represents the concluding result            >>> deg            18.4

Splitting a string into lines

Very frequently, a string contains lots of text and we want to split the text into split up lines
Lines may be separated by dissimilar control characters on different platforms: \due north on Unix/Linux/Mac, \r\n on Windows

>>> t =            '1st line\n2nd line\n3rd line'            # Unix-line            >>>            print            t            onest line            iind line            3rd line >>> t.split('\n') ['1st line',            'second line',            'third line'] >>> t.splitlines() ['1st line',            '2d line',            'third line'] >>> t =            '1st line\r\n2nd line\r\n3rd line'            # Windows            >>> t.split('\n') ['1st line\r',            'second line\r',            '3rd line']            # not what nosotros desire            >>> t.splitlines()            # cantankerous platform!            ['1st line',            '2nd line',            'tertiary line']

Strings are constant - immutable - objects

Yous cannot change a string in-place (equally you tin can with lists and arrays) - all changes of a strings results in a new string

>>> s[18] =            v            ...            TypeError:            'str'            object            does            non            support item assignment  >>>            # build a new cord by adding pieces of due south:            >>> s2 = southward[:18] +            '5'            + s[19:] >>> s2            'Berlin: xviii.4 C at 5 pm'

Stripping off leading/abaft whitespace

>>> south =          '   text with leading/trailing space   \north'          >>> s.strip()          'text with leading/abaft space'          >>> s.lstrip()          # left strip          'text with leading/abaft space   \due north'          >>> s.rstrip()          # right strip          '   text with leading/trailing space'

Some convenient string functions

>>>          '214'.isdigit()          True          >>>          '  214 '.isdigit()          Imitation          >>>          'ii.14'.isdigit()          False          >>> s.lower()          'berlin: 18.4 c at iv pm'          >>> s.upper()          'BERLIN: eighteen.4 C AT 4 PM'          >>> s.startswith('Berlin')          True          >>> s.endswith('am')          False          >>>          '    '.isspace()          # blanks          True          >>>          '  \northward'.isspace()          # newline          True          >>>          '  \t '.isspace()          # TAB          True          >>>          ''.isspace()          # empty cord          False

Joining a list of substrings to a new string

Nosotros can put strings together with a delimiter in between:

>>> strings = ['Newton',            'Secant',            'Bisection'] >>>            ', '.bring together(strings)            'Newton, Secant, Bisection'

These are inverse operations:

t = delimiter.join(stringlist) stringlist = t.split(delimiter)

Divide off the get-go two words on a line:

>>> line =            'This is a line of words separated by infinite'            >>> words = line.split() >>> line2 =            ' '.bring together(words[2:]) >>> line2            'a line of words separated by infinite'

Example: Read pairs of numbers (x,y) from a file

Sample file:

(1.three,0)    (-1,2)    (three,-1.5) (0,1)      (1,0)     (1,1) (0,-0.01)  (10.five,-1) (2.v,-2.five)

Algorithm:

Read line by line
For each line, split line into words
For each word, strip off the parethesis and split the residue wrt comma

The lawmaking for reading pairs

lines =          open('read_pairs.dat',          'r').readlines()  pairs = []          # list of (n1, n2) pairs of numbers          for          line          in          lines:     words = line.divide()          for          word          in          words:         word = word[1:-ane]          # strip off parenthesis          n1, n2 = word.dissever(',')         n1 =          bladder(n1);  n2 =          float(n2)         pair = (n1, n2)         pairs.suspend(pair)

Output of a pretty print of the pairs list

[(ane.iii,          0.0),  (-1.0,          2.0),  (3.0, -i.5),  (0.0,          1.0),  (1.0,          0.0),  (ane.0,          1.0),  (0.0, -0.01),  (ten.5, -i.0),  (2.v, -two.5)]

Alternative solution: Python syntax in file format

Suppose the file format

(1.3,            0)    (-i,            two)    (iii, -1.5) ...

was slightly different:

[(1.three,            0),    (-1,            ii),    (3, -1.5), ... ]

Running eval on the perturbed format produces the desired list!

text =            open up('read_pairs2.dat',            'r').read() text =            '['            + text.replace(')',            '),') +            ']'            pairs =            eval(text)

Web pages are nothing but text files

The text is a mix of HTML commands and the text displayed in the browser:

            <html>            <trunk            bgcolor=            "orange"            >            <h1>A Very Simple Spider web Page</h1>            <!-- headline -->            Ordinary text is written as ordinary text, only when we need headlines, lists,            <ul>            <li><em>emphasized words</em>, or            <li>            <b>boldfaced words</b>,            </ul>            we need to embed the text inside HTML tags. We can also insert GIF or PNG images, taken from other Internet sites, if desired.            <hour>            <!-- horizontal line -->            <img            src=            "http://world wide web.simula.no/simula_logo.gif"            >            </trunk>            </html>

The web page generated by HTML code from the previous slide

Programs can extract information from web pages

A programme tin download a spider web folio, as an HTML file, and extract data by interpreting the text in the file (using string operations).
Example: climate information from the United kingdom

Download oxforddata.txt to a local file Oxford.txt:

            import            urllib            baseurl =            'http://world wide web.metoffice.gov.uk/climate/uk/stationdata'            filename =            'oxforddata.txt'            url = baseurl +            '/'            + filename urllib.urlretrieve(url, filename='Oxford.txt')

The structure of the Oxfort.txt weather data file

Oxford Location:          4509E          2072N,          63          metres amsl Estimated data          is          marked          with          a * after the value. Missing data (more than          2          days missing          in          month)          is          marked by  ---. Sunshine data taken          from          an          automatic          ...          yyyy  mm   tmax    tmin      af    rain     sun               degC    degC    days      mm   hours          1853          1          8.4          ii.7          four          62.eight          ---          1853          two          3.2          -ane.8          nineteen          29.iii          ---          1853          three          7.7          -0.six          20          25.9          ---          1853          4          12.six          iv.5          0          60.i          ---          1853          v          16.8          six.1          0          59.5          ---  ...          2010          5          17.6          7.3          0          28.half dozen          207.4          2010          6          23.0          11.i          0          34.five          230.5          2010          7          23.3*          14.1*          0*          24.four*          184.4*  Provisional          2010          ten          14.6          7.4          2          43.5          128.eight          Provisional

Reading the climate data