Panda Isnot Reading All of the Entries in Column Reading Csv File Python

27. Reading and Writing Data in Pandas

By Bernd Klein. Last modified: 01 Feb 2022.

On this page ➤

All the powerful data structures like the Series and the DataFrames would avail to null, if the Pandas module wouldn't provide powerful functionalities for reading in and writing out information. Information technology is not only a matter of having a functions for interacting with files. To exist useful to data scientists it besides needs functions which back up the virtually important data formats like

Delimiter-separated files, like e.g. csv
Microsoft Excel files
HTML
XML
JSON

Delimiter-separated Values

Digits as File Input and Output

Most people take csv files equally a synonym for delimter-separated values files. They leave the fact out of account that csv is an acronym for "comma separated values", which is not the instance in many situations. Pandas besides uses "csv" and contexts, in which "dsv" would be more appropriate.

Delimiter-separated values (DSV) are defined and stored two-dimensional arrays (for example strings) of data by separating the values in each row with delimiter characters divers for this purpose. This way of implementing data is often used in combination of spreadsheet programs, which tin can read in and write out data as DSV. They are also used as a general information exchange format.

Nosotros call a text file a "delimited text file" if it contains text in DSV format.

For instance, the file dollar_euro.txt is a delimited text file and uses tabs (\t) as delimiters.

Reading CSV and DSV Files

Pandas offers two ways to read in CSV or DSV files to be precise:

DataFrame.from_csv
read_csv

There is no big divergence betwixt those two functions, e.g. they take different default values in some cases and read_csv has more than paramters. We will focus on read_csv, because DataFrame.from_csv is kept within Pandas for reasons of backwards compatibility.

              import              pandas              equally              pd              exchange_rates              =              pd              .              read_csv              (              "/data1/dollar_euro.txt"              ,              sep              =              "              \t              "              )              print              (              exchange_rates              )

OUTPUT:

              Yr   Average  Min USD/EUR  Max USD/EUR  Working days 0   2016  0.901696     0.864379     0.959785           247 1   2015  0.901896     0.830358     0.947688           256 two   2014  0.753941     0.716692     0.823655           255 iii   2013  0.753234     0.723903     0.783208           255 4   2012  0.778848     0.743273     0.827198           256 5   2011  0.719219     0.671953     0.775855           257 6   2010  0.755883     0.686672     0.837381           258 7   2009  0.718968     0.661376     0.796495           256 8   2008  0.683499     0.625391     0.802568           256 9   2007  0.730754     0.672314     0.775615           255 10  2006  0.797153     0.750131     0.845594           255 11  2005  0.805097     0.740357     0.857118           257 12  2004  0.804828     0.733514     0.847314           259 13  2003  0.885766     0.791766     0.963670           255 14  2002  1.060945     0.953562     ane.165773           255 15  2001  one.117587     1.047669     one.192748           255 16  2000  ane.085899     0.962649     i.211827           255 17  1999  0.939475     0.848176     0.998502           261

Equally we can see, read_csv used automatically the start line as the names for the columns. It is possible to give other names to the columns. For this purpose, we have to skip the starting time line by setting the parameter "header" to 0 and we have to assign a list with the column names to the parameter "names":

              import              pandas              every bit              pd              exchange_rates              =              pd              .              read_csv              (              "/data1/dollar_euro.txt"              ,              sep              =              "              \t              "              ,              header              =              0              ,              names              =              [              "year"              ,              "min"              ,              "max"              ,              "days"              ])              print              (              exchange_rates              )

OUTPUT:

              yr       min       max  days 2016  0.901696  0.864379  0.959785   247 2015  0.901896  0.830358  0.947688   256 2014  0.753941  0.716692  0.823655   255 2013  0.753234  0.723903  0.783208   255 2012  0.778848  0.743273  0.827198   256 2011  0.719219  0.671953  0.775855   257 2010  0.755883  0.686672  0.837381   258 2009  0.718968  0.661376  0.796495   256 2008  0.683499  0.625391  0.802568   256 2007  0.730754  0.672314  0.775615   255 2006  0.797153  0.750131  0.845594   255 2005  0.805097  0.740357  0.857118   257 2004  0.804828  0.733514  0.847314   259 2003  0.885766  0.791766  0.963670   255 2002  ane.060945  0.953562  1.165773   255 2001  1.117587  1.047669  1.192748   255 2000  one.085899  0.962649  1.211827   255 1999  0.939475  0.848176  0.998502   261

Exercise i

The file "countries_population.csv" is a csv file, containing the population numbers of all countries (July 2014). The delimiter of the file is a space and commas are used to split up groups of thousands in the numbers. The method 'head(due north)' of a DataFrame can be used to requite out only the get-go n rows or lines. Read the file into a DataFrame.

Solution:

              popular              =              pd              .              read_csv              (              "/data1/countries_population.csv"              ,              header              =              None              ,              names              =              [              "Land"              ,              "Population"              ],              index_col              =              0              ,              quotechar              =              "'"              ,              sep              =              " "              ,              thousands              =              ","              )              print              (              popular              .              head              (              5              ))

OUTPUT:

              Population Country                    Prc           1355692576 India           1236344631 Eu   511434812 United States    318892103 Indonesia        253609643

Writing csv Files

Writing CSV Files

We can create csv (or dsv) files with the method "to_csv". Before nosotros exercise this, we will prepare some data to output, which we will write to a file. We have two csv files with population information for various countries. countries_male_population.csv contains the figures of the male person populations and countries_female_population.csv correspondingly the numbers for the female populations. Nosotros will create a new csv file with the sum:

            column_names            =            [            "State"            ]            +            list            (            range            (            2002            ,            2013            ))            male_pop            =            pd            .            read_csv            (            "/data1/countries_male_population.csv"            ,            header            =            None            ,            index_col            =            0            ,            names            =            column_names            )            female_pop            =            pd            .            read_csv            (            "/data1/countries_female_population.csv"            ,            header            =            None            ,            index_col            =            0            ,            names            =            column_names            )            population            =            male_pop            +            female_pop

	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012
Country
Commonwealth of australia	19640979.0	19872646	20091504	20339759	20605488	21015042	21431781	21874920	22342398	22620554	22683573
Austria	8139310.0	8067289	8140122	8206524	8265925	8298923	8331930	8355260	8375290	8404252	8443018
Kingdom of belgium	10309725.0	10355844	10396421	10445852	10511382	10584534	10666866	10753080	10839905	10366843	11035958
Canada	NaN	31361611	31372587	31989454	32299496	32649482	32927372	33327337	33334414	33927935	34492645
Czech republic	10269726.0	10203269	10211455	10220577	10251079	10287189	10381130	10467542	10506813	10532770	10505445
Kingdom of denmark	5368354.0	5383507	5397640	5411405	5427459	5447084	5475791	5511451	5534738	5560628	5580516
Finland	5194901.0	5206295	5219732	5236611	5255580	5276955	5300484	5326314	5351427	5375276	5401267
France	59337731.0	59630121	59900680	62518571	62998773	63392140	63753140	64366962	64716310	65129746	65394283
Germany	82440309.0	82536680	82531671	82500849	82437995	82314906	82217837	82002356	81802257	81751602	81843743
Greece	10988000.0	11006377	11040650	11082751	11125179	11171740	11213785	11260402	11305118	11309885	11290067
Hungary	10174853.0	10142362	10116742	10097549	10076581	10066158	10045401	10030975	10014324	9985722	9957731
Republic of iceland	286575.0	288471	290570	293577	299891	307672	315459	319368	317630	318452	319575
Ireland	3882683.0	3963636	4027732	4109173	4209019	4239848	4401335	4450030	4467854	4569864	4582769
Italy	56993742.0	57321070	57888245	58462375	58751711	59131287	59619290	60045068	60340328	60626442	60820696
Japan	127291000.0	127435000	127620000	127687000	127767994	127770000	127771000	127692000	127510000	128057000	127799000
Korea	47639618.0	47925318	48082163	48138077	48297184	48456369	48606787	48746693	48874539	49779440	50004441
Luxembourg	444050.0	448300	451600	455000	469086	476187	483799	493500	502066	511840	524853
Mexico	101826249.0	103039964	104213503	103001871	103946866	104874282	105790725	106682518	107550697	108396211	115682867
Netherlands	16105285.0	16192572	16258032	16305526	16334210	16357992	16405399	16485787	16574989	16655799	16730348
New Zealand	3939130.0	4009200	4062500	4100570	4139470	4228280	4268880	4315840	4367740	4405150	4433100
Norway	4524066.0	4552252	4577457	4606363	4640219	4681134	4737171	4799252	4858199	4920305	4985870
Poland	38632453.0	38218531	38190608	38173835	38157055	38125479	38115641	38135876	38167329	38200037	38538447
Portugal	10335559.0	10407465	10474685	10529255	10569592	10599095	10617575	10627250	10637713	10636979	10542398
Slovak Republic	5378951.0	5379161	5380053	5384822	5389180	5393637	5400998	5412254	5424925	5435273	5404322
Spain	40409330.0	41550584	42345342	43038035	43758250	44474631	45283259	45828172	45989016	46152926	46818221
Sweden	8909128.0	8940788	8975670	9011392	9047752	9113257	9182927	9256347	9340682	9415570	9482855
Switzerland	7261210.0	7313853	7364148	7415102	7459128	7508739	7593494	7701856	7785806	7870134	7954662
Turkey	NaN	70171979	70689500	71607500	72519974	72519974	70586256	71517100	72561312	73722988	74724269
United Kingdom	58706905.0	59262057	59699828	60059858	60412870	60781346	61179260	61595094	62026962	62498612	63256154
United States	277244916.0	288774226	290810719	294442683	297308143	300184434	304846731	305127551	307756577	309989078	312232049

            population            .            to_csv            (            "/data1/countries_total_population.csv"            )

We want to create a new DataFrame with all the information, i.e. female person, male and consummate population. This means that nosotros have to innovate an hierarchical index. Before we exercise it on our DataFrame, nosotros will introduce this problem in a simple instance:

              import              pandas              every bit              pd              shop1              =              {              "foo"              :{              2010              :              23              ,              2011              :              25              },              "bar"              :{              2010              :              xiii              ,              2011              :              29              }}              shop2              =              {              "foo"              :{              2010              :              223              ,              2011              :              225              },              "bar"              :{              2010              :              213              ,              2011              :              229              }}              shop1              =              pd              .              DataFrame              (              shop1              )              shop2              =              pd              .              DataFrame              (              shop2              )              both_shops              =              shop1              +              shop2              print              (              "Sales of shop1:              \n              "              ,              shop1              )              print              (              "              \n              Sales of both shops              \n              "              ,              both_shops              )

OUTPUT:

Sales of shop1:        foo  bar 2010   23   13 2011   25   29  Sales of both shops        foo  bar 2010  246  226 2011  250  258

              shops              =              pd              .              concat              ([              shop1              ,              shop2              ],              keys              =              [              "i"              ,              "two"              ])              shops

		foo	bar
i	2010	23	13
i	2011	25	29
ii	2010	223	213
ii	2011	225	229

We want to swap the hierarchical indices. For this we will utilise 'swaplevel':

              shops              .              swaplevel              ()              shops              .              sort_index              (              inplace              =              True              )              shops

		foo	bar
one	2010	23	13
one	2011	25	29
two	2010	223	213
two	2011	225	229

Nosotros will get back to our initial problem with the population figures. We will apply the same steps to those DataFrames:

              pop_complete              =              pd              .              concat              ([              population              .              T              ,              male_pop              .              T              ,              female_pop              .              T              ],              keys              =              [              "total"              ,              "male person"              ,              "female"              ])              df              =              pop_complete              .              swaplevel              ()              df              .              sort_index              (              inplace              =              True              )              df              [[              "Austria"              ,              "Australia"              ,              "French republic"              ]]

	State	Republic of austria	Australia	France
2002	female person	4179743.0	9887846.0	30510073.0
	male	3959567.0	9753133.0	28827658.0
	total	8139310.0	19640979.0	59337731.0
2003	female	4158169.0	9999199.0	30655533.0
	male	3909120.0	9873447.0	28974588.0
	total	8067289.0	19872646.0	59630121.0
2004	female person	4190297.0	10100991.0	30789154.0
	male person	3949825.0	9990513.0	29111526.0
	total	8140122.0	20091504.0	59900680.0
2005	female	4220228.0	10218321.0	32147490.0
	male person	3986296.0	10121438.0	30371081.0
	total	8206524.0	20339759.0	62518571.0
2006	female	4246571.0	10348070.0	32390087.0
	male person	4019354.0	10257418.0	30608686.0
	full	8265925.0	20605488.0	62998773.0
2007	female	4261752.0	10570420.0	32587979.0
	male	4037171.0	10444622.0	30804161.0
	total	8298923.0	21015042.0	63392140.0
2008	female person	4277716.0	10770864.0	32770860.0
	male	4054214.0	10660917.0	30982280.0
	total	8331930.0	21431781.0	63753140.0
2009	female	4287213.0	10986535.0	33208315.0
	male	4068047.0	10888385.0	31158647.0
	full	8355260.0	21874920.0	64366962.0
2010	female person	4296197.0	11218144.0	33384930.0
	male	4079093.0	11124254.0	31331380.0
	total	8375290.0	22342398.0	64716310.0
2011	female person	4308915.0	11359807.0	33598633.0
	male	4095337.0	11260747.0	31531113.0
	full	8404252.0	22620554.0	65129746.0
2012	female person	4324983.0	11402769.0	33723892.0
	male	4118035.0	11280804.0	31670391.0
	total	8443018.0	22683573.0	65394283.0

            df            .            to_csv            (            "/data1/countries_total_population.csv"            )

Live Python training

instructor-led training course

Upcoming online Courses

Enrol here

Do 2

Read in the dsv file (csv) bundeslaender.txt. Create a new file with the columns 'land', 'area', 'female', 'male', 'population' and 'density' (inhabitants per square kilometres.
print out the rows where the expanse is greater than 30000 and the population is greater than 10000
Print the rows where the density is greater than 300

              lands              =              pd              .              read_csv              (              '/data1/bundeslaender.txt'              ,              sep              =              " "              )              print              (              lands              .              columns              .              values              )

OUTPUT:

['land' 'area' 'male' 'female']

              # swap the columns of our DataFrame:              lands              =              lands              .              reindex              (              columns              =              [              'land'              ,              'expanse'              ,              'female'              ,              'male person'              ])              lands              [:              2              ]

	land	expanse	female person	male person
0	Baden-Württemberg	35751.65	5465	5271
one	Bayern	70551.57	6366	6103

            lands            .            insert            (            loc            =            len            (            lands            .            columns            ),            column            =            'population'            ,            value            =            lands            [            'female person'            ]            +            lands            [            'male'            ])

	land	area	female	male person	population
0	Baden-Württemberg	35751.65	5465	5271	10736
ane	Bayern	70551.57	6366	6103	12469
2	Berlin	891.85	1736	1660	3396

              lands              .              insert              (              loc              =              len              (              lands              .              columns              ),              column              =              'density'              ,              value              =              (              lands              [              'population'              ]              *              1000              /              lands              [              'area'              ])              .              circular              (              0              ))              lands              [:              iv              ]

	land	area	female	male	population	density
0	Baden-Württemberg	35751.65	5465	5271	10736	300.0
1	Bayern	70551.57	6366	6103	12469	177.0
2	Berlin	891.85	1736	1660	3396	3808.0
3	Brandenburg	29478.61	1293	1267	2560	87.0

              print              (              lands              .              loc              [(              lands              .              expanse              >              30000              )              &              (              lands              .              population              >              10000              )])

OUTPUT:

              land      surface area  female  male  population  density 0    Baden-Württemberg  35751.65    5465  5271       10736    300.0 1               Bayern  70551.57    6366  6103       12469    177.0 9  Nordrhein-Westfalen  34085.29    9261  8797       18058    530.0

Reading and Writing Excel Files

It is also possible to read and write Microsoft Excel files. The Pandas functionalities to read and write Excel files use the modules 'xlrd' and 'openpyxl'. These modules are not automatically installed by Pandas, so yous may take to install them manually!

We will use a uncomplicated Excel document to demonstrate the reading capabilities of Pandas. The certificate sales.xls contains two sheets, one called 'week1' and the other one 'week2'.
An Excel file can be read in with the Pandas part "read_excel". This is demonstrated in the following example Python code:

              excel_file              =              pd              .              ExcelFile              (              "/data1/sales.xls"              )              sheet              =              pd              .              read_excel              (              excel_file              )              sheet

	Weekday	Sales
0	Mon	123432.980000
1	Tuesday	122198.650200
2	Wednesday	134418.515220
3	Thursday	131730.144916
4	Fri	128173.431003

The document "sales.xls" contains two sheets, but we only have been able to read in the first i with "read_excel". A consummate Excel document, which can consist of an arbitrary number of sheets, can exist completely read in like this:

              docu              =              {}              for              sheet_name              in              excel_file              .              sheet_names              :              docu              [              sheet_name              ]              =              excel_file              .              parse              (              sheet_name              )              for              sheet_name              in              docu              :              impress              (              "              \north              "              +              sheet_name              +              ":              \n              "              ,              docu              [              sheet_name              ])

OUTPUT:

week1:       Weekday          Sales 0     Mon  123432.980000 i    Tuesday  122198.650200 2  Midweek  134418.515220 3   Thursday  131730.144916 4     Friday  128173.431003  week2:       Weekday          Sales 0     Monday  223277.980000 one    Tuesday  234441.879000 2  Wednesday  246163.972950 3   Thursday  241240.693491 four     Fri  230143.621590

We will calculate now the avarage sales numbers of the two weeks:

              boilerplate              =              docu              [              "week1"              ]              .              re-create              ()              average              [              "Sales"              ]              =              (              docu              [              "week1"              ][              "Sales"              ]              +              docu              [              "week2"              ][              "Sales"              ])              /              2              print              (              average              )

OUTPUT:

              Weekday          Sales 0     Monday  173355.480000 i    Tuesday  178320.264600 2  Midweek  190291.244085 iii   Thursday  186485.419203 4     Friday  179158.526297

We volition save the DataFrame 'average' in a new document with 'week1' and 'week2' as additional sheets also:

            writer            =            pd            .            ExcelWriter            (            '/data1/sales_average.xlsx'            )            document            [            'week1'            ]            .            to_excel            (            writer            ,            'week1'            )            document            [            'week2'            ]            .            to_excel            (            writer            ,            'week2'            )            average            .            to_excel            (            author            ,            'average'            )            writer            .            salvage            ()            writer            .            close            ()

Sales_average LibreOffice

Live Python grooming

instructor-led training course

Upcoming online Courses

Enrol here

garciaemeorms.blogspot.com

Source: https://python-course.eu/numerical-programming/reading-and-writing-data-in-pandas.php

Panda Isnot Reading All of the Entries in Column Reading Csv File Python

27. Reading and Writing Data in Pandas

Delimiter-separated Values

Reading CSV and DSV Files

OUTPUT:

OUTPUT:

Exercise i

OUTPUT:

Writing csv Files

OUTPUT:

Do 2

OUTPUT:

OUTPUT:

Reading and Writing Excel Files

OUTPUT:

OUTPUT:

0 Response to "Panda Isnot Reading All of the Entries in Column Reading Csv File Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel