Complete Data Structure in short

Introduction to Data Structures

Data structures are fundamental building blocks in computer science that organize data in a specific way to facilitate efficient access, manipulation, and storage.

Concepts of Data and Information

Data: Raw, unprocessed facts or values represented in a computer system.
Information: Data that has been processed and organized to convey meaning.

Classification of Data Structures

Data structures can be categorized based on their organization and operations:

Linear Data Structures: Elements arranged in a sequential order, allowing access one at a time.
- Arrays: Fixed-size contiguous memory locations to store elements of the same data type.
- Linked Lists: Dynamically allocated nodes containing data and a reference (pointer) to the next node, enabling variable-size data storage.
Non-Linear Data Structures: Elements arranged in a more complex, non-sequential manner.
- Trees: Hierarchical structures where nodes represent data and branches connect related nodes.
- Graphs: Collections of vertices (nodes) connected by edges, useful for modeling relationships.

Abstract Data Types (ADTs)

ADTs specify the operations (functions) that can be performed on a data structure without revealing its underlying implementation details. This promotes modularity and reusability.

Implementation Aspects: Memory Representation

Arrays: Stored in contiguous memory locations, providing efficient random access by index.
Linked Lists: Nodes are scattered in memory, connected by pointers. Accessing a specific node requires traversing the list from the beginning, making random access slower. However, linked lists are more memory-efficient for dynamic data collections.

Data Structures Operations and Cost Estimation

Common operations on data structures and their approximate time complexities:

Search: Finding a specific element.
- Arrays: O(1) (average) for random access.
- Linked Lists: O(n) in the worst case, as we might need to traverse the entire list.
Insertion: Adding a new element.
- Arrays: O(n) in the worst case, as existing elements might need to be shifted to make space (depends on insertion location).
- Linked Lists: O(1) (average) for insertion at the beginning or end, as only pointer manipulation is involved.
Deletion: Removing an element.
- Arrays: O(n) in the worst case, similar to insertion.
- Linked Lists: O(1) (average) for deletion at the beginning or end, but O(n) for deletion from the middle.

Introduction to Linear Data Structures

Arrays

Arrays offer efficient random access due to contiguous memory allocation.
Size is fixed at declaration, requiring careful planning for dynamic data needs.
Insertion or deletion in the middle can be expensive due to element shifting.

Linked Lists

Linked lists are dynamic, allowing for flexible data size.
Accessing a specific element requires traversing from the beginning.
Insertion or deletion at the beginning or end is efficient.

Representation of Linked List in Memory

Each linked list node typically consists of two fields:

Data Field: Stores the actual data value.
Link Field (Pointer): Holds the memory address of the next node in the list.

Different Implementations of Linked Lists

Singly Linked List: Nodes have a pointer to the next node.
Doubly Linked List: Nodes have pointers to both the next and previous nodes, enabling easier traversal in both directions.
Circular Linked List: The last node’s pointer points back to the first node, creating a continuous loop.

Applications of Linked Lists

Linked lists are versatile and can be used for various tasks, including:

Polynomial Manipulation: Represent polynomials as linked lists of terms (coefficients and exponents).
Undo/Redo Functionality: Implement stacks or LIFO (Last-In-First-Out) behavior using linked lists.
Graph Representation: Model relationships between nodes using linked lists.
Sparse Matrices: Represent sparse matrices efficiently by storing only non-zero elements in a linked list structure.

Stacks

Stacks as an Abstract Data Type (ADT)

A stack is a linear data structure that follows the Last-In-First-Out (LIFO) principle. It’s like a stack of plates: you can only add or remove plates from the top. The ADT for a stack defines the following operations:

push(x): Adds an element x to the top of the stack.
pop(): Removes and returns the top element from the stack.
peek(): Returns the top element without removing it. (Optional)
isEmpty(): Checks if the stack is empty.
isFull(): Checks if the stack is full (applicable for fixed-size implementations).

Different Implementations of Stacks

Array:
- Simplest implementation, uses an array to store elements.
- push(): Increments a top pointer and inserts the element at that index.
- pop(): Decrements the top pointer, returns the element at that index.
- Limitations: Fixed size, can lead to overflow/underflow issues.
Linked List:
- More flexible, uses linked list nodes to store elements.
- push(): Creates a new node, sets its data field to x, and links it to the current top node.
- pop(): Stores the top node’s data, updates the top pointer to the next node, and returns the stored data.
- Advantages: Dynamic size, no overflow/underflow issues.

Multiple Stacks

A single array or linked list can be partitioned to create multiple logical stacks.
Each stack maintains its own top pointer or head node.
Useful for managing different data types or functionalities within the same memory space.

Applications of Stacks

Conversion of infix to postfix notation: Stacks are used to temporarily store operands and operators during the conversion process.
Evaluation of postfix expression: The stack helps evaluate postfix expressions by processing elements from left to right.
Recursion: Function calls are stored on a stack during function execution, allowing for backtracking when the function returns.

Queues

Queues as an Abstract Data Type (ADT)

A queue is a linear data structure that follows the First-In-First-Out (FIFO) principle. It’s like a waiting line: the first person in line is served first. The ADT for a queue defines the following operations:

enqueue(x): Adds an element x to the back of the queue.
dequeue(): Removes and returns the element from the front of the queue.
isEmpty(): Checks if the queue is empty.
isFull(): Checks if the queue is full (applicable for fixed-size implementations). (Optional)

Different Implementations of Queues

Array:
- Similar to stack implementation, uses an array to store elements.
- enqueue(): Increments a rear pointer and inserts the element at the end (array index).
- dequeue(): Removes the element at the front (array index 0), decrements a front pointer. (May involve shifting elements)
- Limitations: Fixed size, can lead to overflow/underflow issues with shifting.
Linked List:
- More flexible, uses linked list nodes to store elements.
- enqueue(): Creates a new node, sets its data field to x, and appends it to the end of the list (updating the tail pointer).
- dequeue(): Removes the first node (head), stores its data, updates the head pointer to the next node, and returns the stored data.
- Advantages: Dynamic size, no overflow/underflow issues.

Circular Queue

A variant of the array-based queue that utilizes a circular approach.
The rear pointer wraps around the array when it reaches the end, allowing for efficient use of space.
Requires additional logic to handle empty and full conditions using both front and rear pointers.

Concept of Dequeue (Double-Ended Queue) and Priority Queue

Dequeue (Double-Ended Queue):
- Supports insertion and deletion from both ends (front and back).
- Can be implemented using two separate stacks or modifying a linked list structure.
Priority Queue:
- Elements have priorities associated with them.
- Elements with higher priorities are dequeued first (may violate FIFO order).
- Typically implemented using heaps (binary trees) or specialized linked list structures.

Queue Simulation

Queues are useful for modeling real-world scenarios like:

Job scheduling: Processes waiting for CPU time can be organized

Tree Definitions

Tree: A hierarchical data structure that simulates a tree-like structure with nodes connected by edges.

Node: A fundamental unit of a tree that stores data and references to other nodes.

Root: The topmost node with no parent.

Parent: A node connected to a child node by an edge.

Child: A node connected to a parent node by an edge.

Sibling: Nodes that share the same parent.

Leaf: A node with no children (sometimes called a terminal node).

Tree Terminology:

Height: The maximum number of edges from the root to the farthest leaf node.

Depth: The number of edges from a specific node to the root node (depth of root is 0).

Order: The maximum number of children a node can have (binary trees have 2, B-trees have a minimum and maximum degree).

Degree: The number of children a specific node has.

Binary Search Tree: A tree where each node has a value greater than all its left children and less than all its right children. This property allows for efficient searching and sorting.Operations:

Insert(x): Adds a new node with value x while maintaining the BST property.

Delete(x): Removes a node with value x while maintaining the BST property.

Search(x): Traverses the tree to find a node with value x.

Traversal:

Inorder: Visits nodes in left subtree, root, then right subtree (produces sorted order for BST).

Preorder: Visits root, then left subtree, then right subtree.

Postorder: Visits left subtree, then right subtree, then root.

Search:

Start at the root node.

If the current node’s value is equal to x, return it.

If x is less than the current node’s value, search the left subtree.

If x is greater than the current node’s value, search the right subtree.

AVL Tree: A self-balancing BST where the height difference between left and right subtrees of any node is at most 1. This ensures efficient search and insertion operations.

Requires additional operations to maintain balance after insertions and deletions.

Heap: A tree-based structure where each node has a value that is either:

Max Heap: Greater than or equal to all its children (used for priority queues).

Min Heap: Less than or equal to all its children (used for efficient sorting algorithms).

Heaps are not BSTs, but they can be implemented using binary trees.

Applications:

Searching and sorting: BSTs and heaps excel at efficient searching and sorting.

File systems: B-trees and B+-trees are used in file systems for efficient data organization due to their optimized disk access patterns.

Network routing: Tries are used in network routers for efficient routing table lookup.

Comparison:

BST: Efficient searching and sorting for sorted data, but insertion and deletion can become slow for unbalanced trees.

AVL Tree: Maintains balance for efficient insertions and deletions, but adds complexity compared to BST.

Heap: Excellent for priority queues and sorting algorithms, but not suitable for general searching.

B-Tree, B+-Tree, B-Tree:* Optimized for disk access patterns, used in file systems for large datasets.

Red-Black Tree: Similar to AVL Tree with slightly different balancing rules, offering good balance and performance.

Forest: A collection of separate, disjoint trees (not connected to each other at the root level).

Multi-way Tree: A tree where each node can have more than two children (generalization of binary tree).

B-Tree: A self-balancing multi-way tree optimized for disk access, with a minimum and maximum degree for nodes.

B+-Tree: Similar to B-Tree, but only stores keys in leaves, allowing for efficient data retrieval and sequential access.

B-Tree:* A variant of B-Tree with slightly different node splitting and insertion rules.

Red-Black Tree: A self-balancing binary search tree with specific node color properties that ensure logarithmic search, insertion, and deletion times.

Graphs

Graphs are powerful data structures used to model relationships between entities. They consist of:

Vertices (Nodes): Represent the individual entities or objects.
Edges: Connections between vertices, indicating a relationship. Edges can be:
- Directed: Specify a direction from one vertex to another (e.g., following someone on social media).
- Undirected: Represent a bidirectional relationship (e.g., friendship between two people).
- Weighted: Assigned a value representing a cost, distance, or strength of the relationship.

Classification of Graphs

Graphs can be further classified based on various properties:

Directed vs. Undirected: As mentioned above.
Simple vs. Multigraph: Simple graphs have only one edge between any two vertices, while multigraphs can have multiple edges.
Connected vs. Disconnected: A connected graph has a path between every pair of vertices. A disconnected graph has isolated components where vertices are not connected.
Cyclic vs. Acyclic: Acyclic graphs (DAGs) have no cycles (closed paths). Cyclic graphs can have loops.

Representation

Graphs can be represented in two common ways:

Adjacency Matrix: A two-dimensional array where rows and columns represent vertices. The value at a specific position (i, j) indicates the presence or weight of an edge between vertex i and vertex j.
Adjacency List: An array or linked list where each element represents a vertex and stores a list of its adjacent vertices and potentially edge weights.

Graph Traversal

Traversing a graph involves visiting each vertex exactly once. Common traversal algorithms include:

Depth-First Search (DFS): Explores as deeply as possible along a chosen path before backtracking and exploring other paths. Useful for finding connected components or topological sorting (DAGs).
Breadth-First Search (BFS): Explores all neighbors of a vertex at the current level before moving to the next level. Useful for finding shortest paths in unweighted graphs.

Graph Algorithms

These algorithms solve specific problems on graphs:

Minimum Spanning Tree (MST): Finds a subset of edges that connects all vertices with a minimum total weight. Kruskal’s and Prim’s algorithms are two popular approaches.
Dijkstra’s Shortest Path Algorithm: Finds the shortest path between a source vertex and all other vertices in a weighted graph.

Comparison of Graph Algorithms

Algorithm	Time Complexity	Suitable for
Kruskal’s	E log V (generally)	Finding MST in undirected, weighted graphs
Prim’s	E log V (generally)	Finding MST in undirected, weighted graphs
Dijkstra’s	V^2 (worst case)	Finding shortest paths in weighted graphs

Note: E represents the number of edges, and V represents the number of vertices in the graph. The actual time complexity can vary depending on the specific implementation and graph properties.

Applications of Graphs

Graphs have numerous applications across various domains:

Social Networks: Modeling relationships between people.
Navigation Systems: Representing roads and intersections.
Computer Networks: Modeling connections between devices.
Circuit Analysis: Representing components and their connections.
Scheduling: Modeling dependencies between tasks.
Recommendation Systems: Finding similar items based on user relationships.

Sorting

Introduction

Sorting is a fundamental operation in computer science that rearranges a collection of items according to a specific order (ascending or descending). It’s essential for efficient searching, data retrieval, and data analysis.

Sorting Methods

Here’s an overview of common sorting methods, along with their time and space complexities (average and worst cases):

Bubble Sort:
- Repeatedly compares adjacent elements, swapping them if they’re in the wrong order.
- Simple to implement, but inefficient for large datasets.
- Time Complexity: O(n^2) (average, worst)
- Space Complexity: O(1)
Selection Sort:
- Finds the minimum (or maximum) element and swaps it with the first (or last) element.
- Repeats the process for the remaining elements.
- Simpler than bubble sort but still inefficient for large datasets.
- Time Complexity: O(n^2) (average, worst)
- Space Complexity: O(1)
Insertion Sort:
- Maintains a sorted sublist and inserts each element from the unsorted part into its correct position in the sorted sublist.
- Efficient for small datasets or partially sorted data.
- Time Complexity: O(n^2) (worst), O(n) (average for nearly sorted data)
- Space Complexity: O(1)
Shell Sort (Improved Insertion Sort):
- Similar to insertion sort, but increments (or gaps) are used to jump ahead during insertion, reducing comparisons.
- More efficient than basic insertion sort for larger datasets.
- Time Complexity: O(n log n) (average, best case for specific gap sequences)
- Space Complexity: O(1)
Merge Sort:
- Divides the list recursively into sublists containing a single element.
- Merges the sublists back together in sorted order.
- Efficient for large datasets due to its divide-and-conquer approach.
- Time Complexity: O(n log n) (average, worst)
- Space Complexity: O(n) (uses extra space for merging)
Quick Sort:
- Picks a pivot element and partitions the list around it: elements less than the pivot to the left, and elements greater than the pivot to the right.
- Sorts the left and right sublists recursively.
- Efficient on average but can have poor performance for specific input sequences.
- Time Complexity: O(n log n) (average), O(n^2) (worst case)
- Space Complexity: O(log n) (average), O(n) (worst case)
Heap Sort:
- Uses a heap data structure to efficiently sort the elements.
- Builds a max-heap (largest element on top) from the input data.
- Repeatedly removes the maximum element (root) from the heap and adds it to the sorted list.
- Efficient for some scenarios but might be slower than merge sort or quick sort for general cases.
- Time Complexity: O(n log n) (average, worst)
- Space Complexity: O(1)
Radix Sort:
- Sorts elements based on individual digits (or characters) starting from the least significant digit.
- Performs multiple passes, sorting each digit position.
- Efficient for certain data types like integers (assuming fixed number of digits) or strings with similar length.
- Time Complexity: O(nk) (k is the number of digits/passes), can be faster than comparison sorts for specific data

Comparison of Sorting Techniques

The choice of sorting algorithm depends on various factors like:

Data size: Merge sort and quick sort (average) are generally efficient for large datasets.
Data type: Radix sort excels for integers with fixed digits.
Nearly sorted data: Insertion sort is efficient for already partially sorted data.
Space complexity: Some sorts (e.g., merge sort) use extra space compared to in-place sorts (e.g., bubble sort, selection sort).

Searching

Basic Search Techniques

Sequential Search (Linear Search):
- Examines each element in the list one by one until the target element is found or the end is reached.
- Simple to implement but inefficient for large datasets.
- Time Complexity: O(n) (worst case)
Binary Search:
- Applicable to sorted arrays or lists.
- Repeatedly divides the search space in half by comparing the target element with the middle element.
- If the target is less than the middle element, the search continues in the left half.
- If the target is greater than the middle element, the search continues in the right half.
- This process continues until the target is found or the entire search space is exhausted.
- Binary search is significantly faster than sequential search for sorted datasets.
- Time Complexity: O(log n) (average, worst)

Comparison of Search Methods

Sequential search: Simpler but slower, especially for large datasets.
Binary search: Much faster for sorted datasets but requires sorted input.

Hashing & Indexing

Hashing:
- A technique for fast data retrieval based on a key.
- A hash function maps a key (data) to a unique index (hash value) in a hash table.
- Similar keys are likely to have similar hash values, allowing for efficient retrieval.
- Useful for scenarios like finding items in a database based on an ID.
- Collisions (multiple keys mapping to the same hash value) can occur and need to be handled using techniques like separate chaining or open addressing.
Indexing:
- Creates an organized structure to facilitate faster data access.
- Like an index in a book, it points to specific data locations.
- Common indexing structures include B-trees and hash tables.
- Efficient for searching and retrieving data, especially for large datasets.

Case Study: Application of Various Data Structures in Operating Systems and DBMS

Operating Systems

Process Management: Processes can be represented using linked lists or trees to manage their state and relationships (parent-child processes).
Memory Management: Memory allocation can utilize stacks for function calls and heaps for dynamic memory allocation.
File Systems: B-trees are often used in file systems to efficiently manage file directories and data access.

Database Management Systems (DBMS)

Tables and Records: Tables can be represented using arrays or linked lists to store data records.
Indexes: B-trees or hash tables are used to create indexes for efficient searching and retrieval of data based on specific columns.
Query Processing: Trees (like B+-trees) are used to optimize query execution by efficiently navigating through indexed data.

Choosing the Right Data Structure

The selection of appropriate data structures depends on the specific needs of your program or application. Consider factors like:

Type of data: Arrays for homogeneous data, linked lists for dynamic data.
Operations: Stacks for LIFO (Last-In-First-Out), queues for FIFO (First-In-First-Out), trees for searching and sorting.
Performance: Hash tables and B-trees for efficient searching and retrieval.

Download as PDF