Are you familiar with reading data files (.dat, .csv, etc.) line by line, but unsure about how to read an entire column of data at once? This article will provide a detailed explanation of how to read all the data in a specific column of a file.
Reading Bulk Data from a dat File
Preparing the dat File
We have prepared a data file named “average_temperature_kyoto_2018.dat” containing average monthly temperatures in Kyoto city for the year 2018.
# averaged temperature in 2018 @ Kyoto city
# 01: month 02: averaged temperature in the daytime
1 3.9
2 4.4
3 10.9
4 16.4
5 20.0
6 23.4
7 29.8
8 29.5
9 23.6
10 18.7
11 13.5
12 8.2
Save this file in the directory “Desktop/Dr.code/python/data-analysis/input_file_all”.
Writing the Code
Save a file named “input_file.py” in the same directory as “average_temperature_kyoto_2018.dat”. Let’s get straight to the point – the code for reading the file is as follows. It’s simpler to write than reading each line individually.
import numpy as np
data_file = 'average_temperature_kyoto_2018.dat'
month = np.loadtxt(data_file, comments='#', usecols = 0)
ave_temperature = np.loadtxt(data_file, comments='#', usecols = 1)
print(month)
print(ave_temperature)
Running the Program
Let’s execute the program mentioned above. Open your terminal, navigate to “Desktop/Dr.code/python/data-analysis/input_file_all”, and run the following command:
python input_file.py
# (Output)
# [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.]
# [ 3.9 4.4 10.9 16.4 20. 23.4 29.8 29.5 23.6 18.7 13.5 8.2]
If all goes well, you should see the output as shown above. Since there was no explanation earlier, let’s now discuss the functionality of np.loadtxt()
.
Explanation of the Code
numpy.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None)
This is the numpy.loadtxt()
function, consisting of 11 parameters. When using it, you don’t need to fill in all 11 parameters; you can choose the ones you need. This function reads the specified columns (usecols) from a file (fname) containing data and returns an array containing the loaded data. In other words, if you write col1 = numpy.loadtxt(fname, usecols=0)
, the data from the first column of the file named fname will be passed to the array col1. Additionally, by writing import numpy as np
, you can use the shorthand np.loadtxt()
.
Here are the details of each parameter of numpy.loadtxt()
(summarized for your reference):
Parameter name | Type | Summary |
---|---|---|
fname | String | Specify the name or path of the file you want to read. |
dtype | data type | (Optional) You can specify the data type of the output. The default is float. |
comments | String | (Optional) Specify the character indicating the start of a comment line. The default is ‘#’, as shown above. |
delimiter | String | (Optional) The string used to separate values. The default is a space. |
converters | dict | (Optional) Used to fill missing values in a column. The default is None. |
skiprows | Int | (Optional) Specify the number of initial rows to skip, including comment lines. The default is 0. |
usecols | Int | (Optional) Specify which columns to load. The default is to load all columns. For example, usecols=(1,4,5) will load the 2nd, 5th, and 6th columns. |
unpack | Boolean | (Optional) Default is False. If set to True, separate column data can be stored in separate variables. |
ndmin | Int | (Optional) The minimum number of dimensions for the returned array. Default is 0, and options are 0, 1, and 2. |
encoding | String | (Optional) The encoding used for decoding the input file. Default is ‘bytes’. |
max_rows | Int | (Optional) Specifies how many lines to read after skipping rows. By default, it reads all lines. This can be useful to avoid reading the last few lines. |
Taking It a Step Further
By making use of parameters, you can condense the code from above into a single line, omitting just one line from the previously written code.
import numpy as np
data_file = 'average_temperature_kyoto_2018.dat'
month, ave_temperature = np.loadtxt(data_file, usecols = (0,1), unpack=True)
print(month)
print(ave_temperature)
Reading Bulk Data from a CSV File
CSV File
We’ve prepared a file named “average_temperature_kyoto_2018.csv” for you. Just like before, place it in the “Desktop/Dr.code/python/data-analysis/input_file_all” directory. The contents are as follows:
# averaged temperature in 2018 @ Kyoto city
# 01: month 02: averaged temperature in the daytime
1,3.9
2,4.4
3,10.9
4,16.4
5,20.0
6,23.4
7,29.8
8,29.5
9,23.6
10,18.7
11,13.5
12,8.2
Sample Code
To modify np.loadtxt(data_file, comments='#', usecols=0)
to np.loadtxt(data_file, comments='#', delimiter=',', usecols=0)
is all it takes! Here’s the complete code, and the output will be just like the dat file.
import numpy as np
data_file = 'average_temperature_kyoto_2018.csv'
month = np.loadtxt(data_file, comments='#', delimiter=',', usecols = 0)
ave_temperature = np.loadtxt(data_file, comments='#', delimiter=',', usecols = 1)
print(month)
print(ave_temperature)