ArchiveOrangemail archive

General discussion list for the Python programming language


python-list.python.org
(List home) (Recent threads) (97 other Python lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • High traffic list: 30+ messages per day
  • This list contains about 646,561 messages, beginning Feb 1999
  • 71 messages added yesterday
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

Re: Help with how to combine two csv files

Ad
Sammy Danso 1336683503Thu, 10 May 2012 20:58:23 +0000 (UTC)
Hi JM and All,
Thank you all very much for your response and help. I managed to work out the problem eventually. But my code is ridiculously long compared to what you have just offered here. I think your code is elegant and should be much faster.  
Thanks  a lot
Sammy Below is my code.  outputfile = codecs.open(csvfile3, 'wb')
 inputfile1 =  codecs.open(csvfile1, 'rb')
 inputfile2 = codecs.open(csvfile2, 'rb')
             
                
                 dictreader2 = csv.DictReader(inputfile2)
                 dictreader1 = csv.DictReader(inputfile1)
                 mergedDict = {}
                 
                 mergedDictb = {}
                 matchedlistA = []
                 matchedlistB = []
                 matchedlist = []
                 cnt = 0
                 cntb = 0
                 for dictline1 in dictreader1:
                         cnt  += 1
                         print cnt
                         mergedDict = dictline1.copy()
                         mergedDict['UniqID']=  cnt
                          matchedlistA.append(mergedDict)
                         
                
                 for dictline2 in dictreader2:
                          cntb  += 1
                          mergedDictb = dictline2.copy()
                          mergedDictb['UniqID']=  cntb
                          matchedlistB.append(mergedDictb)
                         
               
                 for dictline1 in matchedlistA: 
                    for dictline2 in matchedlistB: 
                        if dictline1['UniqID'] == dictline2['UniqID']: 
                            entry = dictline1.copy() 
                            entry.update(dictline2)
                            matchedlist.append(entry)  

--- On Wed, 5/9/12, Jean-Michel Pichavant  wrote:


From: Jean-Michel Pichavant 
Subject: Re: Help with how to combine two csv files
To: "Sammy Danso" 
Cc: 
Date: Wednesday, May 9, 2012, 11:23 AMSammy Danso wrote:
> Hello Experts,
> I am new to python and I have been trying to merge two csv files, and upon several hours of unsuccessful attempts, I have decided to seek for help.
>  the format of the file is as follows. file A has  columns a, b, c and values 1,2,3 for several rows. File B also has columns d,e and values 4,5  for same number of rows as A. the files however do not have any unique column between the two.
> I want an output file C to have columns  a,b,c,d,e with values 1,2,3,4,5
>  I would be very grateful for your help with some code.
>  Thanks very much,
> Sammy
>Post some code so we may point at your problem.

The solution is easy, the code very small.
Something like (pseudo code):

import csv
writer = csv.writer(file3)
for index, row in enumerate(csv.reader(file1)):
   writer.writerow(row + csv.reader(file2)[index])

JM
Dennis Lee Bieber 1336760113Fri, 11 May 2012 18:15:13 +0000 (UTC)
On Thu, 10 May 2012 13:55:40 -0700 (PDT), Sammy Danso
 declaimed the following in
gmane.comp.python.general:> Thank you all very much for your response and help. I managed to work out the problem eventually. But my code is ridiculously long compared to what you have just offered here. I think your code is elegant and should be much faster.  
>  
>  outputfile = codecs.open(csvfile3, 'wb')
>  inputfile1 =  codecs.open(csvfile1, 'rb')
>  inputfile2 = codecs.open(csvfile2, 'rb')
>I'll concede that's a new one for me... I know the csv module is not
rated to work with Unicode... The help system doesn't explain what
happens when no encoding is specified for the input.>                 
>                  dictreader2 = csv.DictReader(inputfile2)
>                  dictreader1 = csv.DictReader(inputfile1)Do you really need dictionary? {see comments below} For one thing,
dictionaries are un-ordered, so the output columns may not be in the
same order as the input...>                  cnt = 0
>                  cntb = 0
>                  for dictline1 in dictreader1:
>                          cnt  += 1
>                          print cnt"cnt" doesn't need to be maintained by hand -- you could use
enumerate on the for loop to get a counter value each time...

			for (cnt, dictline1) in enumerate(dictreader1):

>                          mergedDict = dictline1.copy()What purpose does this copy serve? Both dictline1 and mergedDict are
overwritten on each pass of the loop?

>                          mergedDict['UniqID']=  cnt
>                           matchedlistA.append(mergedDict)And here you appear to use "cnt" merely to add a "key" to the
dictionary containing the line number of the original data. Instead of
this you could just use a list of lists (no dictionary reader, just the
normal reader that returns a list of values). After all, your "cnt"
value is just the index into the list (remembering that lists index
starts at 0, not 1).

	You've essentially read the entire file into memory, converting it
into a list of dictionaries, with each dictionary containing the same
set of keys, to which you've added a key whose value is the record
number.>                          
>                 
>                  for dictline2 in dictreader2:<snip>

	Here you do the same thing with the second file. Identical code
except for the 1/A becoming 2/B... First means of shortening the code
would be to define a function to read one file and return the results.
Then call that function passing each file...

	A = readFile(csvfile1)
	B = readFile(csvfile2)>                
>                  for dictline1 in matchedlistA: 
>                     for dictline2 in matchedlistB: 
>                         if dictline1['UniqID'] == dictline2['UniqID']:Big time waste... Your lists are ALREADY IN RECORD NUMBER ORDER... 

	Instead, for each record in the outer list, you are "reading" ALLentries in the second list, trying to match on a value you wrote to the
records just to keep track of the order of the data.

>                             entry = dictline1.copy() 
>                             entry.update(dictline2)
>                             matchedlist.append(entry) 
>Even more -- you don't break out of the inner list when you do find
the match. If each list contains 3 records, you end up processing 9
comparisons! 

1, 1	save
1, 2	null
1, 3	null
2, 1	null
2, 2	save
2, 3	null
3, 1	null
3, 2	null
3, 3	save

	For 10 records, you do 100 comparisons. AND you are creating a third
list in memory.

	Since you are matching solely by record position, AND only saving
data for records in common to both files (that is, you ignore any data
if a file is longer than the other) the entire operation condenses to:

-=-=-=-=-=-
import csv

FILEIN_1 = "file_1_name.csv"
FILEIN_2 = "file_2_name.csv"
OUTPUT = "output_file_name.csv"

fin1 = open(FILEIN_1, "rb")
fin2 = open(FILEIN_2, "rb")
outf = open(OUTPUT, "wb")

csvin1 = csv.reader(fin1)
csvin2 = csv.reader(fin2)
csvout = csv.writer(outf)

while True:
	line1 = csvin1.next()
	line2 = csvin2.next()
	if not (line1 and line2): break
	csvout.writerow(line1.extend(line2))

outf.close()
fin1.close()
fin2.close()
-=-=-=-=-=-

	BTW: you never showed your output code <G>-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
            HTTP://wlfraed.home.netcom.com/
Home | About | Privacy