Reading Key Value Pair in Python Separated by Space
Ch.6: Dictionaries and strings
Aug fifteen, 2015
Goals
- Larn more near file reading
- Store file information in a new object blazon: dictionary
- Translate content in files via string manipulation
The master focus in the grade is on working with files, dictionaries and strings. The book has boosted cloth on how to utilize data from the Internet.
Dictionaries
figfiles = {'fig1.pdf': 81761, 'fig2.png': 8754} figfiles['fig3.png'] = os.path.getsize(filename) for name in figfiles: print 'File size of %g is %d:' % (proper noun, figfiles[name])
A dictionary is a generalization of a list
- Features of lists:
- shop a sequence of elements in a single object (
[1,3,-one]) - each element is a Python object
- the elements are indexed past integers 0, ane, ...
- Dictionaries tin index objects in a collection via text
(= "lists with text index") - Lexicon in Python is called hash, HashMap and associative array in other languages
The listing alphabetize is sometimes unnatural for locating an element of a collection of objects
Suppose we need to store the temperatures in Oslo, London and Paris.
List solution:
temps = [xiii, 15.iv, 17.five] # temps[0]: Oslo # temps[one]: London # temps[2]: Paris print 'The temperature in Oslo is', temps[0] Can look upward a temperature by mapping metropolis to index to float
But it would be more natural to write temps[Oslo]!
Dictionaries map strings to objects
# Initialize dictionary temps = {'Oslo': xiii, 'London': 15.4, 'Paris': 17.v} # Applications print 'The temperature in London is', temps['London'] impress 'The temperature in Oslo is', temps['Oslo'] Important:
- The string index, similar
Oslo, is called primal, whiletemps['Oslo']is the associated value - A dictionary is an unordered drove of fundamental-value pairs
Initializing dictionaries
2 ways of initializing a collection of key-value pairs:
mydict = {'key1': value1, 'key2': value2, ...} temps = {'Oslo': thirteen, 'London': xv.iv, 'Paris': 17.five} # or mydict = dict(key1=value1, key2=value2, ...) temps = dict(Oslo=13, London=fifteen.4, Paris=17.5) Add a new chemical element to a dict (dict = dictionary):
>>> temps['Madrid'] = 26.0 >>> print temps {'Oslo': xiii, 'London': 15.4, 'Paris': 17.5, 'Madrid': 26.0} Looping (iterating) over a dict means looping over the keys
for key in dictionary: value = dictionary[key] impress value Example:
>>> for city in temps: ... impress 'The %s temperature is %g' % (city, temps[city]) ... The Paris temperature is 17.5 The Oslo temperature is 13 The London temperature is xv.iv The Madrid temperature is 26 Annotation: the sequence of keys is arbitrary! Use sort if yous need a particular sequence:
for city in sorted(temps): # alphabetic sort of keys value = temps[city] print value Tin examination for detail keys, delete elements, etc
Does the dict have a item key?
>>> if 'Berlin' in temps: ... print 'Berlin:', temps['Berlin'] ... else: ... impress 'No temperature data for Berlin' ... No temperature information for Berlin >>> 'Oslo' in temps # standard boolean expression True Delete an element of a dict:
>>> del temps['Oslo'] # remove Oslo key w/value >>> temps {'Paris': 17.5, 'London': xv.4, 'Madrid': 26.0} >>> len(temps) # no of key-value pairs in dict. 3 The keys and values can be reached equally lists
Python version two:
>>> temps.keys() ['Paris', 'London', 'Madrid'] >>> temps.values() [17.5, 15.iv, 26.0] Python version three: temps.keys() and temps.values() are iterators, not lists!
>>> for city in temps.keys(): # works in Py 2 and 3 >>> print urban center ... Paris Madrid London >>> keys_list = list(temps.keys()) # Py iii: iterator -> list Caution: ii variables can alter the same dictionary
>>> t1 = temps >>> t1['Stockholm'] = x.0 # modify t1 >>> temps # temps is also changed! {'Stockholm': 10.0, 'Paris': 17.v, 'London': xv.iv, 'Madrid': 26.0} >>> t2 = temps.copy() # take a copy >>> t2['Paris'] = xvi >>> t1['Paris'] # t1 was not changed 17.five Call back the aforementioned for lists:
>>> L = [i, 2, 3] >>> M = 50 >>> M[1] = 8 >>> L[1] 8 >>> M = L[:] # have copy of Fifty >>> G[2] = 0 >>> 50[two] 3 Whatever constant object can be used as fundamental
- And then far: key is text (string object)
- Keys tin be any immutable (constant) object (!)
>>> d = {1: 34, ii: 67, 3: 0} # key is int >>> d = {thirteen: 'Oslo', 15.iv: 'London'} # possible >>> d = {(0,0): 4, (i,-1): 5} # key is tuple >>> d = {[0,0]: 4, [-1,one]: v} # list is mutable/changeable ... TypeError: unhashable type: 'list' Example: Polynomials represented by dictionaries
The information in the polynomial $$ p(x)=-1 + x^two + 3x^vii $$ can be represented by a dict with power equally key (int) and coefficient as value (float):
p = {0: -ane, 2: 1, vii: 3.5} Evaluate such a polynomial \( \sum_{i\in I} c_ix^i \) for some \( ten \):
def eval_poly_dict(poly, x): sum = 0.0 for ability in poly: sum += poly[power]*x**power render sum Brusque pro version:
def eval_poly_dict2(poly, x): # Python's sum tin can add elements of an iterator return sum(poly[power]*x**power for power in poly) Polynomials can as well be represented past lists
The list alphabetize corresponds to the power, due east.g., the data in \( -1 + ten^2 + 3x^seven \) is represented as
p = [-ane, 0, 1, 0, 0, 0, 0, 3] The full general polynomial \( \sum_{i=0}^N c_ix^i \) is stored equally [c0, c1, c2, ..., cN].
Evaluate such a polynomial \( \sum_{i=0}^N c_ix^i \) for some \( x \):
def eval_poly_list(poly, x): sum = 0 for ability in range(len(poly)): sum += poly[power]*10**power return sum What is all-time for polynomials: lists or dictionaries?
Dictionaries need but store the nonzero terms. Compare dict vs list for the polynomial \( 1 - x^{200} \):
p = {0: 1, 200: -i} # len(p) is 2 p = [ane, 0, 0, 0, ..., 200] # len(p) is 201 Dictionaries can easily handle negative powers, east.thou., \( {1\over2}10^{-iii} + 2x^iv \)
p = {-3: 0.5, 4: 2} print eval_poly_dict(p, x=4) Quick recap of file reading
infile = open(filename, 'r') # open up file for reading line = infile.readline() # read the next line filestr = infile.read() # read rest of file into string lines = infile.readlines() # read rest of file into list for line in infile: # read rest of file line by line infile.shut() # recall to close!
Instance: Read file data into a dictionary
Data file:
Oslo: 21.8 London: 18.1 Berlin: 19 Paris: 23 Rome: 26 Helsinki: 17.8 Store in dict, with urban center names as keys and temperatures every bit values
Programme:
infile = open('deg2.dat', 'r') temps = {} # start with empty dict for line in infile.readlines(): city, temp = line.split() metropolis = city[:-1] # remove terminal char (:) temps[urban center] = float(temp) A tabular file can be read into a nested lexicon
Data file table.dat:
A B C D 1 eleven.seven 0.035 2017 99.1 ii 9.2 0.037 2019 101.2 three 12.2 no no 105.2 iv 10.1 0.031 no 102.1 5 9.ane 0.033 2009 103.iii 6 viii.7 0.036 2015 101.9 Create a dict data[p][i] (dict of dict) to concur measurement no. i (1, ii, etc.) of belongings p ('A', 'B', etc.)
We must first develop the plan (algorithm) for doing this
- Examine the outset line:
- dissever it into words
- initialize a lexicon with the property names equally keys and empty dictionaries
{}as values - For each of the remaining lines:
- split line into words
- for each word after the first: if word is not
no, convert to float and shop
Skillful exercise: practice this now!
(Come across the volume for a complete implementation.)
Example: Download data from the web and visualize
Problem:
- Compare the stock prices of Microsoft, Apple, and Google over decades
- http://finance.yahoo.com/ offers such data in files with tabular form
Appointment,Open,Loftier,Low,Shut,Book,Adj Close 2014-02-03,502.61,551.19,499.30,545.99,12244400,545.99 2014-01-02,555.68,560.20,493.55,500.lx,15698500,497.62 2013-12-02,558.00,575.14,538.80,561.02,12382100,557.68 2013-11-01,524.02,558.33,512.38,556.07,9898700,552.76 2013-x-01,478.45,539.25,478.28,522.70,12598400,516.57 ... 1984-x-01,25.00,27.37,22.l,24.87,5654600,ii.73 1984-09-07,26.50,29.00,24.62,25.12,5328800,2.76 We demand to analyze the file format to observe the algorithm for interpreting the content
Appointment,Open,High,Low,Close,Volume,Adj Close 2014-02-03,502.61,551.19,499.thirty,545.99,12244400,545.99 2014-01-02,555.68,560.20,493.55,500.sixty,15698500,497.62 2013-12-02,558.00,575.14,538.80,561.02,12382100,557.68 2013-11-01,524.02,558.33,512.38,556.07,9898700,552.76 2013-x-01,478.45,539.25,478.28,522.70,12598400,516.57 ... 1984-10-01,25.00,27.37,22.50,24.87,5654600,2.73 1984-09-07,26.l,29.00,24.62,25.12,5328800,2.76 File format:
- Columns are separated by comma
- First column is the engagement, the final is the cost of interest
- The prizes kickoff at different dates
We need algorithms before we can write code
Algorithm for reading information:
- skip first line
- read line by line
- split each line wrt. comma
- shop first give-and-take (appointment) in a listing of dates
- store final word (prize) in a listing of prices
- collect appointment and price list in a dictionary (cardinal is company)
- brand a role for reading 1 company's file
Plotting:
- Convert yr-month-day time specifications in strings into year coordinates along the ten axis
- Annotation that the companies' price history starts at different years
No code is presented hither...
Run into the book for all details. If you lot understand this quite comprehensive example, you know and understand a lot!
Plot of normalized stock prices in logarithmic scale
Much computer history in this plot:
String manipulation
>>> due south = 'This is a string' >>> s.split() ['This', 'is', 'a', 'cord'] >>> 'This' in due south True >>> s.find('is') 4 >>> ', '.join(south.dissever()) 'This, is, a, string'
String manipulation is key to translate the content of files
- Text in Python is represented as strings
- Inspecting and manipulating strings is the mode we can sympathize the contents of files
- Programme: first show bones operations, and then accost real examples
Sample cord used for illustrations:
>>> s = 'Berlin: 18.four C at 4 pm' Strings behave much similar lists/tuples - they are a sequence of characters:
>>> south[0] 'B' >>> south[1] 'east' >>> s[-one] 'm' Extracting substrings
Substrings are only as slices of lists and arrays:
>>> s 'Berlin: 18.4 C at four pm' >>> s[8:] # from alphabetize eight to the end of the cord 'xviii.4 C at 4 pm' >>> s[8:12] # index 8, 9, 10 and 11 (not 12!) '18.4' >>> s[viii:-one] 'eighteen.4 C at 4 p' >>> s[viii:-8] '18.four C' Find commencement of substring:
>>> s.discover('Berlin') # where does 'Berlin' start? 0 # at index 0 >>> s.find('pm') 20 >>> s.find('Oslo') # non found -1 Checking if a substring is independent in a cord
>>> 'Berlin' in southward: True >>> 'Oslo' in s: False >>> if 'C' in s: ... print 'C found' ... else: ... print 'no C' ... C found
Substituting a substring by some other string
s.replace(s1, s2): supplant s1 by s2
>>> s.replace(' ', '__') 'Berlin:__18.4__C__at__4__pm' >>> south.replace('Berlin', 'Bonn') 'Bonn: 18.4 C at 4 pm' Example: replace the text before the starting time colon by 'Bonn'
>>> southward 'Berlin: 18.4 C at iv pm' >>> s.replace(s[:southward.find(':')], 'Bonn') 'Bonn: 18.4 C at iv pm' 1) s.find(':') returns 6, 2) s[:6] is 'Berlin', three) Berlin is replaced past 'Bonn'
Splitting a string into a list of substrings
due south.divide(sep): split up s into a list of substrings separated past sep (no separator implies split wrt whitespace):
>>> s 'Berlin: 18.4 C at four pm' >>> s.divide(':') ['Berlin', ' 18.4 C at 4 pm'] >>> s.split() ['Berlin:', '18.4', 'C', 'at', '4', 'pm'] Try to empathise this i:
>>> s.carve up(':')[one].split()[0] '18.4' >>> deg = float(_) # _ represents the concluding result >>> deg 18.4 Splitting a string into lines
- Very frequently, a string contains lots of text and we want to split the text into split up lines
- Lines may be separated by dissimilar control characters on different platforms:
\due northon Unix/Linux/Mac,\r\non Windows
>>> t = '1st line\n2nd line\n3rd line' # Unix-line >>> print t onest line iind line 3rd line >>> t.split('\n') ['1st line', 'second line', 'third line'] >>> t.splitlines() ['1st line', '2d line', 'third line'] >>> t = '1st line\r\n2nd line\r\n3rd line' # Windows >>> t.split('\n') ['1st line\r', 'second line\r', '3rd line'] # not what nosotros desire >>> t.splitlines() # cantankerous platform! ['1st line', '2nd line', 'tertiary line'] Strings are constant - immutable - objects
Yous cannot change a string in-place (equally you tin can with lists and arrays) - all changes of a strings results in a new string
>>> s[18] = v ... TypeError: 'str' object does non support item assignment >>> # build a new cord by adding pieces of due south: >>> s2 = southward[:18] + '5' + s[19:] >>> s2 'Berlin: xviii.4 C at 5 pm' Stripping off leading/abaft whitespace
>>> south = ' text with leading/trailing space \north' >>> s.strip() 'text with leading/abaft space' >>> s.lstrip() # left strip 'text with leading/abaft space \due north' >>> s.rstrip() # right strip ' text with leading/trailing space'
Some convenient string functions
>>> '214'.isdigit() True >>> ' 214 '.isdigit() Imitation >>> 'ii.14'.isdigit() False >>> s.lower() 'berlin: 18.4 c at iv pm' >>> s.upper() 'BERLIN: eighteen.4 C AT 4 PM' >>> s.startswith('Berlin') True >>> s.endswith('am') False >>> ' '.isspace() # blanks True >>> ' \northward'.isspace() # newline True >>> ' \t '.isspace() # TAB True >>> ''.isspace() # empty cord False
Joining a list of substrings to a new string
Nosotros can put strings together with a delimiter in between:
>>> strings = ['Newton', 'Secant', 'Bisection'] >>> ', '.bring together(strings) 'Newton, Secant, Bisection' These are inverse operations:
t = delimiter.join(stringlist) stringlist = t.split(delimiter) Divide off the get-go two words on a line:
>>> line = 'This is a line of words separated by infinite' >>> words = line.split() >>> line2 = ' '.bring together(words[2:]) >>> line2 'a line of words separated by infinite' Example: Read pairs of numbers (x,y) from a file
Sample file:
(1.three,0) (-1,2) (three,-1.5) (0,1) (1,0) (1,1) (0,-0.01) (10.five,-1) (2.v,-2.five) Algorithm:
- Read line by line
- For each line, split line into words
- For each word, strip off the parethesis and split the residue wrt comma
The lawmaking for reading pairs
lines = open('read_pairs.dat', 'r').readlines() pairs = [] # list of (n1, n2) pairs of numbers for line in lines: words = line.divide() for word in words: word = word[1:-ane] # strip off parenthesis n1, n2 = word.dissever(',') n1 = bladder(n1); n2 = float(n2) pair = (n1, n2) pairs.suspend(pair)
Output of a pretty print of the pairs list
[(ane.iii, 0.0), (-1.0, 2.0), (3.0, -i.5), (0.0, 1.0), (1.0, 0.0), (ane.0, 1.0), (0.0, -0.01), (ten.5, -i.0), (2.v, -two.5)]
Alternative solution: Python syntax in file format
Suppose the file format
(1.3, 0) (-i, two) (iii, -1.5) ... was slightly different:
[(1.three, 0), (-1, ii), (3, -1.5), ... ] Running eval on the perturbed format produces the desired list!
text = open up('read_pairs2.dat', 'r').read() text = '[' + text.replace(')', '),') + ']' pairs = eval(text) Web pages are nothing but text files
The text is a mix of HTML commands and the text displayed in the browser:
<html> <trunk bgcolor= "orange" > <h1>A Very Simple Spider web Page</h1> <!-- headline --> Ordinary text is written as ordinary text, only when we need headlines, lists, <ul> <li><em>emphasized words</em>, or <li> <b>boldfaced words</b>, </ul> we need to embed the text inside HTML tags. We can also insert GIF or PNG images, taken from other Internet sites, if desired. <hour> <!-- horizontal line --> <img src= "http://world wide web.simula.no/simula_logo.gif" > </trunk> </html> The web page generated by HTML code from the previous slide
Programs can extract information from web pages
- A programme tin download a spider web folio, as an HTML file, and extract data by interpreting the text in the file (using string operations).
- Example: climate information from the United kingdom
Download oxforddata.txt to a local file Oxford.txt:
import urllib baseurl = 'http://world wide web.metoffice.gov.uk/climate/uk/stationdata' filename = 'oxforddata.txt' url = baseurl + '/' + filename urllib.urlretrieve(url, filename='Oxford.txt') The structure of the Oxfort.txt weather data file
Oxford Location: 4509E 2072N, 63 metres amsl Estimated data is marked with a * after the value. Missing data (more than 2 days missing in month) is marked by ---. Sunshine data taken from an automatic ... yyyy mm tmax tmin af rain sun degC degC days mm hours 1853 1 8.4 ii.7 four 62.eight --- 1853 two 3.2 -ane.8 nineteen 29.iii --- 1853 three 7.7 -0.six 20 25.9 --- 1853 4 12.six iv.5 0 60.i --- 1853 v 16.8 six.1 0 59.5 --- ... 2010 5 17.6 7.3 0 28.half dozen 207.4 2010 6 23.0 11.i 0 34.five 230.5 2010 7 23.3* 14.1* 0* 24.four* 184.4* Provisional 2010 ten 14.6 7.4 2 43.5 128.eight Provisional Reading the climate data
Algorithm:
- Read the place and location in the file header
- Skip the next 5 (for us uninteresting) lines
- Read the column data and store in dictionary
- Test for numbers with special annotation, "provisional" cavalcade, etc.
Plan, function 1:
local_file = 'Oxford.txt' infile = open(local_file, 'r') data = {} data['place'] = infile.readline().strip() data['location'] = infile.readline().strip() # Skip the next 5 lines for i in range(5): infile.readline() Reading the climate data - program, part two
Program, office ii:
data['data'] ={} for line in infile: columns = line.split() year = int(columns[0]) calendar month = int(columns[1]) if columns[-1] == 'Conditional': del columns[-1] for i in range(ii, len(columns)): if columns[i] == '---': columns[i] = None elif columns[i][-i] == '*' or columns[i][-ane] == '#': # Strip off trailing graphic symbol columns[i] = float(columns[i][:-1]) else: columns[i] = float(columns[i]) Reading the climate data - program, part three
Plan, function 3.
for line in infile: ... tmax, tmin, air_frost, pelting, dominicus = columns[ii:] if not year in information['data']: data['data'][twelvemonth] = {} data['data'][twelvemonth][month] = {'tmax': tmax, 'tmin': tmin, 'air frost': air_frost, 'sun': lord's day} Summary of dictionary functionality
| Construction | Meaning |
|---|---|
a = {} | initialize an empty lexicon |
a = {'indicate': [0,0.1], 'value': 7} | initialize a dictionary |
a = dict(bespeak=[two,7], value=3) | initialize a dictionary w/string keys |
a.update(b) | add/update key-value pairs from b in a |
a.update(key1=value1, key2=value2) | add/update cardinal-value pairs in a |
a['hibernate'] = Truthful | add together new key-value pair to a |
a['point'] | get value corresponding to primal point |
for key in a: | loop over keys in unknown order |
for key in sorted(a): | loop over keys in alphabetic order |
'value' in a | True if cord value is a key in a |
del a['point'] | delete a central-value pair from a |
listing(a.keys()) | list of keys |
list(a.values()) | list of values |
len(a) | number of key-value pairs in a |
isinstance(a, dict) | is True if a is a lexicon |
Summary of some string operations
s = 'Berlin: 18.iv C at iv pm' s[8:17] # excerpt substring s.find(':') # index where first ':' is found southward.dissever(':') # split into substrings due south.split() # dissever wrt whitespace 'Berlin' in south # examination if substring is in southward s.replace('18.4', '20') s.lower() # lower case letters merely s.upper() # upper case letters merely due south.split()[iv].isdigit() s.strip() # remove leading/trailing blanks ', '.bring together(list_of_words)
Source: http://hplgit.github.io/scipro-primer/slides/dictstring/html/dictstring-solarized.html
0 Response to "Reading Key Value Pair in Python Separated by Space"
Post a Comment