When it comes to loading datasets in Python Programming it can be done using variety of libraries that depends on the format of data you are working with.
Here are some popular Python libraries used for loading datasets:
1. Pandas:
Supported Formats: CSV, Excel, SQL, HDF5, JSON, and more.
Example:
import pandas as pd
# Load a CSV file into a DataFrame
df = pd.read_csv('your_dataset.csv')
2. NumPy:
Supported Formats: NumPy arrays can be created from various sources, but NumPy itself is not specifically designed for loading datasets like Pandas.
Example:
import numpy as np
# Create a NumPy array from a list
data = np.array([1, 2, 3, 4, 5])
3. NumPy’s loadtxt and genfromtxt:
Supported Formats: Text files containing numerical data.
Example:
import numpy as np
# Load data from a text file
data = np.loadtxt('your_data.txt')
4. Scikit-learn:
Supported Formats: Scikit-learn provides datasets that are commonly used for machine learning tasks.
It includes functions to load these datasets.
Example:
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
5. OpenCV:
Supported Formats: Primarily used for image and video data.
Example:
import cv2
# Read an image file
img = cv2.imread('your_image.jpg')
6. TensorFlow and PyTorch:
Supported Formats: These libraries provide tools to load datasets for deep learning tasks, often in custom formats.
Example:
import tensorflow as tf
# Load a dataset using TensorFlow
dataset = tf.data.Dataset.from_tensor_slices(your_data)
7. H5py:
Supported Formats: HDF5 format.
import h5py
# Load data from an HDF5 file
with h5py.File('your_data.h5', 'r') as hf:
data = hf['dataset_name'][:]