Skip to main content

How Wrong Use of Data Structure is Costing You Performance



Data structure is a specialise way of storing and organising data so that it can be access efficiently. Today a good chunk of our codes makes use of data structure, and the importance of using the proper data structure for the right job cannot be over emphasised. There are many types of data structure, each one designed to handle the storing and manipulation of data in a specific way. The trick here now is, knowing how to use data structure is not enough but knowing the type to use that’s suits your operation without sacrificing performance.  The determinant factor of the right data structure to use in your program is dependent on some factor like the type of operation you want to perform on the data set like Searching, addition, deletion and access by index.


Comparison between Basic Data Structures


Data Structure
Addition
Search
Deletion
Access by Index
Array(T[])
O(N)
O(N)
O(N)
O(1)
LinkedList(LinkedList<T>)
O(1)
O(N)
O(N)
O(N)
Dynamic array(List<T>)
O(1)
O(N)
O(N)
O(1)
Stack(stack<T>
O(1)
-
O(1)
-
Queue(Queue<T>)
O(1)
-
O(1)
-
Dictionary, implemented with a hash-table(Dictionary<k, T>)
O(1)
O(1)
O(1)
-


When to Use a Particular Data Structure
Array (T[])
Arrays is data structure consisting of collection of elements from a given type where the elements preserved their orders. Each elements can be accessed and identified through its index.


Adding a new element to Array


To add a new element to array is a slow operation because it involves the allocating of memory with same size and copying all the data from the former to the latter (new array). Array in this case, is not suited for program that involves the addition of large elements because it will drag your program to a halt.
Searching


Searching in an array is a slow operation and takes time because it involves comparing every element in the array to the searched value. In small application this may not seem like much, but the pain comes when you are dealing with a large data set.


Deletion
Deleting an element from an array it’s a very slow operation, this involve allocating a new memory with the same size, minus the number of element removed and copying all the old element except the removed ones. Array is not suited for deletion of element in a large data set, it takes time and will reduce the performance of your program.
Accessing by index
Accessing array from index is a very fast operation because it’s direct. Use array when you want to process a particular number of element to which you need access by index.



Use array only when you want to process a number of elements to which you need access by index.
Dynamic Array (List<T>)
Dynamic Array is a very popular data structure used in programming. The List<T> stores it’s element in an array that’s its size is bigger than the number of stored element. Dynamic array does not have fixed size and allows elements to be accessed through its index.


Searching in Dynamic Array


Searching in dynamic array is very slow because you have to search all through all the elements. Though this may seem trivial but will cause significant performance reduction in application that deals with large data set.


Deletion in Dynamic Array


Removing elements in Dynamic Array is a very slow operation, because it takes a linear time and this involves moving all elements after the deleted one with one position to the left.


Adding Elements to Dynamic Array


Dynamic Array stores its element in an array in which the size is bigger than the count of the stored element. When an element is added, there is an empty cell in its inner array and this process takes a constant time. Some times that array may be filled and this operation takes a linear time but this scenario is rare. For a large element, the case complexity of adding an element to a dynamic array will be a constant. Totalling the steps needed to add 100 elements for both case (fast add and add with expand) and dividing by 100 will obtain a constant that will be almost equivalent to adding 1,000 elements. In summary performing addition with a dynamic array is a very fast operation regardless of the number of elements.


Accessing by Index
Accessing elements by index is fast, because the elements are stored internally in an array.


Use Dynamic Array only when you want to add elements quickly and accessed them through index.


Stack


Stack is a linear data structure with 3 principal operation: Push which adds elements to the collection, POP to removed the most recently added element, PEEP inspecting the element from the top without removing it. Stack is a data structure that has LIFO behaviour( Last in, First Out). Use such data structure in situation like keeping the current path in a recursive search.

Use stack when you have to implement the model, Last in, First out(LIFO).
Queue


Queue is first in, first out and a linear data structure that involves adding an element to the queue(tail) and removing the front situated element from the head(dequeue), this two actions takes a constant time for execution. An instance of implementation of queue can be pointed out in the implementation of BFS(Breadth-First Search) algorithm, in which a search is started from the first element and all its neighbours are added to the queue and processed in the order that were added, and this action is continually iterated until the requirement is found.

Use Queue only when you want to implement First-in, First out model


Singly / Doubly Linked List (LinkedList<T>)


This is a data structure that holds a collection of elements, which preserve their order and consists of sequentially linked records called nodes. Their representation in memory is dynamic and pointer based.
Adding a new element to Singly / Doubly Linked List (LinkedList<T>)
Adding elements is fast but not to be compared to List<T>, because every time an element is added, a new memory is created.


Searching in Singly / Doubly Linked List (LinkedList<T>)
Searching is slow, because it has to search all element to get the desired one, while these may seem trivial with small data set but can really drag your program to halt when dealing with programs that processes large data set.


Accessing an element by Index
Accessing an element by index can be very slow, because elements in Singly / Doubly Linked List (LinkedList<T>) are not indexed, and you will have to traverse all the element instead.

Deletion
Removing an element in Singly / Doubly Linked List (LinkedList<T>)  is very slow, because there is no indexing, so to locate the element to remove involves searching through all other elements.

Use Singly / Doubly Linked List (LinkedList<T>)  when you need to add and remove elements at both end of the list.


Dictionary, Implemented with a Hash-Table (Dictionary<K, T>)


Dictionary stores key-value pair and provides a quick search by key. Dictionary uses an array to stores its element internally and the position of every element are determined by a hash- function.
Hash-table is known to be the fastest data structure that provides adding and searching by key. Choosing a bad hash-function may cause many collision and this will make the basic function to very inefficient. Hash-Table should be used when will need to add an element quickly and also access it quickly via its index


Hash-Table should be used when will need to add an element quickly and also access it quickly via its index



Comments

Popular posts from this blog

How to implement RESTful API Versioning in ASP.NET Web API 2 using IHttpRouteConstraint

The only thing constant in life is change, and that is proved everyday in our industry, API’s are cool to extend the functionality of your application and expose it to other developers. The cool thing about IT and software, it’s that things changes quite rapidly and so it’s the technology, hence technology can change and the needs of your organisation can change, hence in order to keep serving this evolving needs and keep been relevant, your api might need to change also. Small changes can be accommodated within the initial version, but changes that will risked breaking the existing code, will required the need for versioning.

Implementing a custom IHttpRouteConstraint

According to msdn, a IHttpRouteConstraint simply Represents a base class route constraint. What then is a route constraint? A route constraint simply gets or sets a dictionary of expressions that specify valid values for a URL parameter.

publicclassApiVersionRouteConstraint : IHttpRouteConstraint
  {

publicApiVersionRouteCo…

How to implement multi-tenancy with subdomains using Route Constraint in ASP.NET MVC

According to Wikipedia, The term "software multitenancy" refers to a software architecture in which a single instance of software runs on a server and serves multiple tenants. A tenant is a group of users who share a common access with specific privileges to the software instance. With a multitenant architecture, a software application is designed to provide every tenant a dedicated share of the instance - including its data, configuration, user management, tenant individual functionality and non-functional properties. Multitenancy contrasts with multi-instance architectures, where separate software instances operate on behalf of different tenants. By giving companies, access to a tenant through a subdomain of choice, will help to personalise the experience more and gives a sense of ownership to each tenant. This will go along way to bring consistency in there branding.
Implementing Route Constraint
You use route constraints to restrict the browser requests that match a partic…