Reading Key Value Pair in Python Separated by Space

Ch.6: Dictionaries and strings

Hans Petter Langtangen [i, two]
[1] Simula Enquiry Laboratory
[2] University of Oslo, Dept. of Information science

Aug fifteen, 2015


Goals

  • Larn more near file reading
  • Store file information in a new object blazon: dictionary
  • Translate content in files via string manipulation

The master focus in the grade is on working with files, dictionaries and strings. The book has boosted cloth on how to utilize data from the Internet.

Dictionaries

figfiles = {'fig1.pdf':          81761,          'fig2.png':          8754}  figfiles['fig3.png'] = os.path.getsize(filename)          for          name          in          figfiles:          print          'File size of %g is %d:'          % (proper noun, figfiles[name])        

A dictionary is a generalization of a list

  • Features of lists:
    • shop a sequence of elements in a single object ([1,3,-one])
    • each element is a Python object
    • the elements are indexed past integers 0, ane, ...
  • Dictionaries tin index objects in a collection via text
    (= "lists with text index")
  • Lexicon in Python is called hash, HashMap and associative array in other languages

The listing alphabetize is sometimes unnatural for locating an element of a collection of objects

Suppose we need to store the temperatures in Oslo, London and Paris.

List solution:

temps = [xiii,            15.iv,            17.five]            # temps[0]: Oslo            # temps[one]: London            # temps[2]: Paris            print            'The temperature in Oslo is', temps[0]          

Can look upward a temperature by mapping metropolis to index to float

But it would be more natural to write temps[Oslo]!

Dictionaries map strings to objects

            # Initialize dictionary            temps = {'Oslo':            xiii,            'London':            15.4,            'Paris':            17.v}            # Applications            print            'The temperature in London is', temps['London']            impress            'The temperature in Oslo is',   temps['Oslo']          

Important:

  • The string index, similar Oslo, is called primal, while temps['Oslo'] is the associated value
  • A dictionary is an unordered drove of fundamental-value pairs

Initializing dictionaries

2 ways of initializing a collection of key-value pairs:

mydict = {'key1': value1,            'key2': value2, ...}  temps = {'Oslo':            thirteen,            'London':            xv.iv,            'Paris':            17.five}            # or            mydict =            dict(key1=value1, key2=value2, ...)  temps =            dict(Oslo=13, London=fifteen.4, Paris=17.5)          

Add a new chemical element to a dict (dict = dictionary):

>>> temps['Madrid'] =            26.0            >>>            print            temps {'Oslo':            xiii,            'London':            15.4,            'Paris':            17.5,            'Madrid':            26.0}          

Looping (iterating) over a dict means looping over the keys

            for            key            in            dictionary:     value = dictionary[key]            impress            value          

Example:

>>>            for            city            in            temps: ...            impress            'The %s temperature is %g'            % (city, temps[city]) ... The Paris temperature            is            17.5            The Oslo temperature            is            13            The London temperature            is            xv.iv            The Madrid temperature            is            26          

Annotation: the sequence of keys is arbitrary! Use sort if yous need a particular sequence:

            for            city            in            sorted(temps):            # alphabetic sort of keys            value = temps[city]            print            value          

Tin examination for detail keys, delete elements, etc

Does the dict have a item key?

>>>            if            'Berlin'            in            temps: ...            print            'Berlin:', temps['Berlin'] ...            else: ...            impress            'No temperature data for Berlin'            ... No temperature information            for            Berlin >>>            'Oslo'            in            temps            # standard boolean expression            True          

Delete an element of a dict:

>>>            del            temps['Oslo']            # remove Oslo key w/value            >>> temps {'Paris':            17.5,            'London':            xv.4,            'Madrid':            26.0} >>>            len(temps)            # no of key-value pairs in dict.            3          

The keys and values can be reached equally lists

Python version two:

>>> temps.keys() ['Paris',            'London',            'Madrid'] >>> temps.values() [17.5,            15.iv,            26.0]          

Python version three: temps.keys() and temps.values() are iterators, not lists!

>>>            for            city            in            temps.keys():            # works in Py 2 and 3            >>>            print            urban center ... Paris Madrid London >>> keys_list =            list(temps.keys())            # Py iii: iterator -> list          

Caution: ii variables can alter the same dictionary

>>> t1 = temps >>> t1['Stockholm'] =            x.0            # modify t1            >>> temps            # temps is also changed!            {'Stockholm':            10.0,            'Paris':            17.v,            'London':            xv.iv,            'Madrid':            26.0} >>> t2 = temps.copy()            # take a copy            >>> t2['Paris'] =            xvi            >>> t1['Paris']            # t1 was not changed            17.five          

Call back the aforementioned for lists:

>>> L = [i,            2,            3] >>> M = 50 >>> M[1] =            8            >>> L[1]            8            >>> M = L[:]            # have copy of Fifty            >>> G[2] =            0            >>> 50[two]            3          

Whatever constant object can be used as fundamental

  • And then far: key is text (string object)
  • Keys tin be any immutable (constant) object (!)
>>> d = {1:            34,            ii:            67,            3:            0}            # key is int            >>> d = {thirteen:            'Oslo',            15.iv:            'London'}            # possible            >>> d = {(0,0):            4, (i,-1):            5}            # key is tuple            >>> d = {[0,0]:            4, [-1,one]:            v}            # list is mutable/changeable            ...            TypeError: unhashable            type:            'list'          

Example: Polynomials represented by dictionaries

The information in the polynomial $$ p(x)=-1 + x^two + 3x^vii $$ can be represented by a dict with power equally key (int) and coefficient as value (float):

p = {0: -ane,            2:            1,            vii:            3.5}          

Evaluate such a polynomial \( \sum_{i\in I} c_ix^i \) for some \( ten \):

            def            eval_poly_dict(poly, x):            sum            =            0.0            for            ability            in            poly:            sum            += poly[power]*x**power            render            sum          

Brusque pro version:

            def            eval_poly_dict2(poly, x):            # Python's sum tin can add elements of an iterator            return            sum(poly[power]*x**power            for            power            in            poly)          

Polynomials can as well be represented past lists

The list alphabetize corresponds to the power, due east.g., the data in \( -1 + ten^2 + 3x^seven \) is represented as

p = [-ane,            0,            1,            0,            0,            0,            0,            3]          

The full general polynomial \( \sum_{i=0}^N c_ix^i \) is stored equally [c0, c1, c2, ..., cN].

Evaluate such a polynomial \( \sum_{i=0}^N c_ix^i \) for some \( x \):

            def            eval_poly_list(poly, x):            sum            =            0            for            ability            in            range(len(poly)):            sum            += poly[power]*10**power            return            sum          

What is all-time for polynomials: lists or dictionaries?

Dictionaries need but store the nonzero terms. Compare dict vs list for the polynomial \( 1 - x^{200} \):

p = {0:            1,            200: -i}            # len(p) is 2            p = [ane,            0,            0,            0, ...,            200]            # len(p) is 201          

Dictionaries can easily handle negative powers, east.thou., \( {1\over2}10^{-iii} + 2x^iv \)

p = {-3:            0.5,            4:            2}            print            eval_poly_dict(p, x=4)          

Quick recap of file reading

infile  =          open(filename,          'r')          # open up file for reading          line    = infile.readline()          # read the next line          filestr = infile.read()          # read rest of file into string          lines   = infile.readlines()          # read rest of file into list          for          line          in          infile:          # read rest of file line by line          infile.shut()          # recall to close!        

Instance: Read file data into a dictionary

Data file:

Oslo:            21.8            London:            18.1            Berlin:            19            Paris:            23            Rome:            26            Helsinki:            17.8          

Store in dict, with urban center names as keys and temperatures every bit values

Programme:

infile =            open('deg2.dat',            'r') temps = {}            # start with empty dict            for            line            in            infile.readlines():     city, temp = line.split()     metropolis = city[:-1]            # remove terminal char (:)            temps[urban center]  =            float(temp)          

A tabular file can be read into a nested lexicon

Data file table.dat:

            A        B       C      D            1            eleven.seven            0.035            2017            99.1            ii            9.2            0.037            2019            101.2            three            12.2            no       no            105.2            iv            10.1            0.031            no            102.1            5            9.ane            0.033            2009            103.iii            6            viii.7            0.036            2015            101.9          

Create a dict data[p][i] (dict of dict) to concur measurement no. i (1, ii, etc.) of belongings p ('A', 'B', etc.)

We must first develop the plan (algorithm) for doing this

  1. Examine the outset line:
    1. dissever it into words
    2. initialize a lexicon with the property names equally keys and empty dictionaries {} as values
  2. For each of the remaining lines:
    1. split line into words
    2. for each word after the first: if word is not no, convert to float and shop

Skillful exercise: practice this now!
(Come across the volume for a complete implementation.)

Example: Download data from the web and visualize

Problem:

  • Compare the stock prices of Microsoft, Apple, and Google over decades
  • http://finance.yahoo.com/ offers such data in files with tabular form
Appointment,Open,Loftier,Low,Shut,Book,Adj Close 2014-02-03,502.61,551.19,499.30,545.99,12244400,545.99 2014-01-02,555.68,560.20,493.55,500.lx,15698500,497.62 2013-12-02,558.00,575.14,538.80,561.02,12382100,557.68 2013-11-01,524.02,558.33,512.38,556.07,9898700,552.76 2013-x-01,478.45,539.25,478.28,522.70,12598400,516.57 ... 1984-x-01,25.00,27.37,22.l,24.87,5654600,ii.73 1984-09-07,26.50,29.00,24.62,25.12,5328800,2.76          

We demand to analyze the file format to observe the algorithm for interpreting the content

Appointment,Open,High,Low,Close,Volume,Adj Close 2014-02-03,502.61,551.19,499.thirty,545.99,12244400,545.99 2014-01-02,555.68,560.20,493.55,500.sixty,15698500,497.62 2013-12-02,558.00,575.14,538.80,561.02,12382100,557.68 2013-11-01,524.02,558.33,512.38,556.07,9898700,552.76 2013-x-01,478.45,539.25,478.28,522.70,12598400,516.57 ... 1984-10-01,25.00,27.37,22.50,24.87,5654600,2.73 1984-09-07,26.l,29.00,24.62,25.12,5328800,2.76          

File format:

  • Columns are separated by comma
  • First column is the engagement, the final is the cost of interest
  • The prizes kickoff at different dates

We need algorithms before we can write code

Algorithm for reading information:

  1. skip first line
  2. read line by line
  3. split each line wrt. comma
  4. shop first give-and-take (appointment) in a listing of dates
  5. store final word (prize) in a listing of prices
  6. collect appointment and price list in a dictionary (cardinal is company)
  7. brand a role for reading 1 company's file

Plotting:

  1. Convert yr-month-day time specifications in strings into year coordinates along the ten axis
  2. Annotation that the companies' price history starts at different years

No code is presented hither...

Run into the book for all details. If you lot understand this quite comprehensive example, you know and understand a lot!

Plot of normalized stock prices in logarithmic scale

Much computer history in this plot:

String manipulation

>>> due south =          'This is a string'          >>> s.split() ['This',          'is',          'a',          'cord'] >>>          'This'          in          due south          True          >>> s.find('is')          4          >>>          ', '.join(south.dissever())          'This, is, a, string'        

String manipulation is key to translate the content of files

  • Text in Python is represented as strings
  • Inspecting and manipulating strings is the mode we can sympathize the contents of files
  • Programme: first show bones operations, and then accost real examples

Sample cord used for illustrations:

>>> s =            'Berlin: 18.four C at 4 pm'          

Strings behave much similar lists/tuples - they are a sequence of characters:

>>> south[0]            'B'            >>> south[1]            'east'            >>> s[-one]            'm'          

Extracting substrings

Substrings are only as slices of lists and arrays:

>>> s            'Berlin: 18.4 C at four pm'            >>> s[8:]            # from alphabetize eight to the end of the cord            'xviii.4 C at 4 pm'            >>> s[8:12]            # index 8, 9, 10 and 11 (not 12!)            '18.4'            >>> s[viii:-one]            'eighteen.4 C at 4 p'            >>> s[viii:-8]            '18.four C'          

Find commencement of substring:

>>> s.discover('Berlin')            # where does 'Berlin' start?            0            # at index 0            >>> s.find('pm')            20            >>> s.find('Oslo')            # non found            -1          

Checking if a substring is independent in a cord

>>>          'Berlin'          in          southward:          True          >>>          'Oslo'          in          s:          False          >>>          if          'C'          in          s: ...          print          'C found'          ...          else: ...          print          'no C'          ... C found        

Substituting a substring by some other string

s.replace(s1, s2): supplant s1 by s2

>>> s.replace(' ',            '__')            'Berlin:__18.4__C__at__4__pm'            >>> south.replace('Berlin',            'Bonn')            'Bonn: 18.4 C at 4 pm'          

Example: replace the text before the starting time colon by 'Bonn'

>>> southward            'Berlin: 18.4 C at iv pm'            >>> s.replace(s[:southward.find(':')],            'Bonn')            'Bonn: 18.4 C at iv pm'          

1) s.find(':') returns 6, 2) s[:6] is 'Berlin', three) Berlin is replaced past 'Bonn'

Splitting a string into a list of substrings

due south.divide(sep): split up s into a list of substrings separated past sep (no separator implies split wrt whitespace):

>>> s            'Berlin: 18.4 C at four pm'            >>> s.divide(':') ['Berlin',            ' 18.4 C at 4 pm'] >>> s.split() ['Berlin:',            '18.4',            'C',            'at',            '4',            'pm']          

Try to empathise this i:

>>> s.carve up(':')[one].split()[0]            '18.4'            >>> deg =            float(_)            # _ represents the concluding result            >>> deg            18.4          

Splitting a string into lines

  • Very frequently, a string contains lots of text and we want to split the text into split up lines
  • Lines may be separated by dissimilar control characters on different platforms: \due north on Unix/Linux/Mac, \r\n on Windows
>>> t =            '1st line\n2nd line\n3rd line'            # Unix-line            >>>            print            t            onest line            iind line            3rd line >>> t.split('\n') ['1st line',            'second line',            'third line'] >>> t.splitlines() ['1st line',            '2d line',            'third line'] >>> t =            '1st line\r\n2nd line\r\n3rd line'            # Windows            >>> t.split('\n') ['1st line\r',            'second line\r',            '3rd line']            # not what nosotros desire            >>> t.splitlines()            # cantankerous platform!            ['1st line',            '2nd line',            'tertiary line']          

Strings are constant - immutable - objects

Yous cannot change a string in-place (equally you tin can with lists and arrays) - all changes of a strings results in a new string

>>> s[18] =            v            ...            TypeError:            'str'            object            does            non            support item assignment  >>>            # build a new cord by adding pieces of due south:            >>> s2 = southward[:18] +            '5'            + s[19:] >>> s2            'Berlin: xviii.4 C at 5 pm'          

Stripping off leading/abaft whitespace

>>> south =          '   text with leading/trailing space   \north'          >>> s.strip()          'text with leading/abaft space'          >>> s.lstrip()          # left strip          'text with leading/abaft space   \due north'          >>> s.rstrip()          # right strip          '   text with leading/trailing space'        

Some convenient string functions

>>>          '214'.isdigit()          True          >>>          '  214 '.isdigit()          Imitation          >>>          'ii.14'.isdigit()          False          >>> s.lower()          'berlin: 18.4 c at iv pm'          >>> s.upper()          'BERLIN: eighteen.4 C AT 4 PM'          >>> s.startswith('Berlin')          True          >>> s.endswith('am')          False          >>>          '    '.isspace()          # blanks          True          >>>          '  \northward'.isspace()          # newline          True          >>>          '  \t '.isspace()          # TAB          True          >>>          ''.isspace()          # empty cord          False        

Joining a list of substrings to a new string

Nosotros can put strings together with a delimiter in between:

>>> strings = ['Newton',            'Secant',            'Bisection'] >>>            ', '.bring together(strings)            'Newton, Secant, Bisection'          

These are inverse operations:

t = delimiter.join(stringlist) stringlist = t.split(delimiter)          

Divide off the get-go two words on a line:

>>> line =            'This is a line of words separated by infinite'            >>> words = line.split() >>> line2 =            ' '.bring together(words[2:]) >>> line2            'a line of words separated by infinite'          

Example: Read pairs of numbers (x,y) from a file

Sample file:

(1.three,0)    (-1,2)    (three,-1.5) (0,1)      (1,0)     (1,1) (0,-0.01)  (10.five,-1) (2.v,-2.five)          

Algorithm:

  1. Read line by line
  2. For each line, split line into words
  3. For each word, strip off the parethesis and split the residue wrt comma

The lawmaking for reading pairs

lines =          open('read_pairs.dat',          'r').readlines()  pairs = []          # list of (n1, n2) pairs of numbers          for          line          in          lines:     words = line.divide()          for          word          in          words:         word = word[1:-ane]          # strip off parenthesis          n1, n2 = word.dissever(',')         n1 =          bladder(n1);  n2 =          float(n2)         pair = (n1, n2)         pairs.suspend(pair)        

Output of a pretty print of the pairs list

[(ane.iii,          0.0),  (-1.0,          2.0),  (3.0, -i.5),  (0.0,          1.0),  (1.0,          0.0),  (ane.0,          1.0),  (0.0, -0.01),  (ten.5, -i.0),  (2.v, -two.5)]        

Alternative solution: Python syntax in file format

Suppose the file format

(1.3,            0)    (-i,            two)    (iii, -1.5) ...          

was slightly different:

[(1.three,            0),    (-1,            ii),    (3, -1.5), ... ]          

Running eval on the perturbed format produces the desired list!

text =            open up('read_pairs2.dat',            'r').read() text =            '['            + text.replace(')',            '),') +            ']'            pairs =            eval(text)          

Web pages are nothing but text files

The text is a mix of HTML commands and the text displayed in the browser:

            <html>            <trunk            bgcolor=            "orange"            >            <h1>A Very Simple Spider web Page</h1>            <!-- headline -->            Ordinary text is written as ordinary text, only when we need headlines, lists,            <ul>            <li><em>emphasized words</em>, or            <li>            <b>boldfaced words</b>,            </ul>            we need to embed the text inside HTML tags. We can also insert GIF or PNG images, taken from other Internet sites, if desired.            <hour>            <!-- horizontal line -->            <img            src=            "http://world wide web.simula.no/simula_logo.gif"            >            </trunk>            </html>          

The web page generated by HTML code from the previous slide

Programs can extract information from web pages

  • A programme tin download a spider web folio, as an HTML file, and extract data by interpreting the text in the file (using string operations).
  • Example: climate information from the United kingdom

Download oxforddata.txt to a local file Oxford.txt:

            import            urllib            baseurl =            'http://world wide web.metoffice.gov.uk/climate/uk/stationdata'            filename =            'oxforddata.txt'            url = baseurl +            '/'            + filename urllib.urlretrieve(url, filename='Oxford.txt')          

The structure of the Oxfort.txt weather data file

Oxford Location:          4509E          2072N,          63          metres amsl Estimated data          is          marked          with          a * after the value. Missing data (more than          2          days missing          in          month)          is          marked by  ---. Sunshine data taken          from          an          automatic          ...          yyyy  mm   tmax    tmin      af    rain     sun               degC    degC    days      mm   hours          1853          1          8.4          ii.7          four          62.eight          ---          1853          two          3.2          -ane.8          nineteen          29.iii          ---          1853          three          7.7          -0.six          20          25.9          ---          1853          4          12.six          iv.5          0          60.i          ---          1853          v          16.8          six.1          0          59.5          ---  ...          2010          5          17.6          7.3          0          28.half dozen          207.4          2010          6          23.0          11.i          0          34.five          230.5          2010          7          23.3*          14.1*          0*          24.four*          184.4*  Provisional          2010          ten          14.6          7.4          2          43.5          128.eight          Provisional        

Reading the climate data

Algorithm:

  1. Read the place and location in the file header
  2. Skip the next 5 (for us uninteresting) lines
  3. Read the column data and store in dictionary
  4. Test for numbers with special annotation, "provisional" cavalcade, etc.

Plan, function 1:

local_file =            'Oxford.txt'            infile =            open(local_file,            'r') data = {} data['place'] = infile.readline().strip() data['location'] = infile.readline().strip()            # Skip the next 5 lines            for            i            in            range(5):     infile.readline()          

Reading the climate data - program, part two

Program, office ii:

data['data'] ={}            for            line            in            infile:     columns = line.split()      year =            int(columns[0])     calendar month =            int(columns[1])            if            columns[-1] ==            'Conditional':            del            columns[-1]            for            i            in            range(ii,            len(columns)):            if            columns[i] ==            '---':             columns[i] =            None            elif            columns[i][-i] ==            '*'            or            columns[i][-ane] ==            '#':            # Strip off trailing graphic symbol            columns[i] =            float(columns[i][:-1])            else:             columns[i] =            float(columns[i])          

Reading the climate data - program, part three

Plan, function 3.

            for            line            in            infile:     ...     tmax, tmin, air_frost, pelting, dominicus = columns[ii:]            if            not            year            in            information['data']:         data['data'][twelvemonth] = {}     data['data'][twelvemonth][month] = {'tmax': tmax,            'tmin': tmin,            'air frost': air_frost,            'sun': lord's day}          

Summary of dictionary functionality

Construction Meaning
a = {} initialize an empty lexicon
a = {'indicate': [0,0.1], 'value': 7} initialize a dictionary
a = dict(bespeak=[two,7], value=3) initialize a dictionary w/string keys
a.update(b) add/update key-value pairs from b in a
a.update(key1=value1, key2=value2) add/update cardinal-value pairs in a
a['hibernate'] = Truthful add together new key-value pair to a
a['point'] get value corresponding to primal point
for key in a: loop over keys in unknown order
for key in sorted(a): loop over keys in alphabetic order
'value' in a True if cord value is a key in a
del a['point'] delete a central-value pair from a
listing(a.keys()) list of keys
list(a.values()) list of values
len(a) number of key-value pairs in a
isinstance(a, dict) is True if a is a lexicon

Summary of some string operations

s =          'Berlin: 18.iv C at iv pm'          s[8:17]          # excerpt substring          s.find(':')          # index where first ':' is found          southward.dissever(':')          # split into substrings          due south.split()          # dissever wrt whitespace          'Berlin'          in          south          # examination if substring is in southward          s.replace('18.4',          '20') s.lower()          # lower case letters merely          s.upper()          # upper case letters merely          due south.split()[iv].isdigit() s.strip()          # remove leading/trailing blanks          ', '.bring together(list_of_words)        

mooredich1989.blogspot.com

Source: http://hplgit.github.io/scipro-primer/slides/dictstring/html/dictstring-solarized.html

0 Response to "Reading Key Value Pair in Python Separated by Space"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel