upGrad KnowledgeHut SkillFest Sale!

DBMS Interview Questions and Answers for 2024

In this article, we will cover some frequently asked DBMS interview questions to help you understand the types of questions you might be asked in a database management system (DBMS)-related interview. There are several job roles in the DBMS management field. With proper preparation, you can apply for these job roles. Here is the list containing some of those roles - Application DBA, Application Developer, Monitoring Users, Database Administrator, Support Specialist and Security Analyst. This article will help you to learn about the DBMS interview questions and answers, which will help you while preparing for these companies.

  • 4.7 Rating
  • 50 Question(s)
  • 60 Mins of Read
  • 5705 Reader(s)

Beginner

A Database Management System can be said to be a computerized data-keeping system. A user of this system can perform multiple types of operations on the system either to manipulate the data in the database or to manage the database structure itself. Database management systems are classified according to their structures and types.

Primary Key can be considered as the principal identifier for each row of the table, based on a specific column. A primary key implements a technical mechanism to ensure that each row has a unique, non-empty value in the primary key column. For example, if you try to add a new row with duplicate values in the primary key columns, the row will be rejected by the primary key.  

Another feature of primary keys is controlled to avoid NULL values. NULL values are not allowed in primary key columns. Consider the example of Students table, studentID is the primary key. In other words, a primary key guarantee that every row in the table has unique, non-null values in the primary key columns.

You can add a primary key to a table during or after table creation if the primary key does not already exist. If columns that are part of a primary key do not have nullability defined, the Database Engine automatically configures them as NOT NULL.  

SQL Server supports many ways to define a primary key when creating a table. If you want to define a primary key on only one column, you can include it as part of the column definition as shown in the following example.   

CREATE TABLE Students 
( 
studentId INT PRIMARY KEY, 
studentName VARCHAR(20) NOT NULL 
); 

In the above example, the table named as Students is being created using SQL statements. In this table studentId column is defined as primary key.  

The primary key data type must be numeric, integer, or fixed-width short characters. Uniqueness is important. A primary key must have unique values. This means that columns in other rows in the table do not contain the same value. These columns are preferable for a primary key.

While assigning primary key to the column we need to make sure to keep both data integrity and query performance in mind. A primary key not only guarantees the uniqueness of each row across a table, but it also helps in querying data and maintaining the database.

A PRIMARY KEY is a value that is used to uniquely identify a record or single row of a table, whereas a UNIQUE KEY is used to apply a unique value to a column or set of columns.  

Here is the difference between PRIMARY KEY and UNIQUE KEY: 

  • Each table can have only one PRIMARY KEY, but each table can have multiple UNIQUE KEYs. 
  • PRIMARY KEY does not allow NULL values, but UNIQUE KEY allows a single NULL value.  
  • A PRIMARY KEY cannot be created on a nullable column, but a UNIQUE KEY can be created on a nullable column.  
  • PRIMARY KEY creates a clustered index by default, but UNIQUE KEY creates a non-clustered index by default. But it's not required at all.  
  • PRIMARY KEY does not create a separate index to store the key value, but UNIQUE KEY creates a separate secondary index to store the key value. 

DBMS is especially useful for managing large data. Using DBMS, we can easily make changes in many data cells by using very few statements. There are many advantages of database management systems. The list of advantages can look like: 

  • Data Sharing: Data from one database can be easily shared by multiple users at the same time. This sharing also allows end users to respond quickly to the changes in the database environment. 
  • Integrity Constraints: Because of such constraints, data can be stored in an organized and elegant manner. 
  • Database Redundancy Control: This eliminates database redundancy by providing a mechanism to consolidate all data into a single database. 
  • Data Security: DBMS provides the tools necessary to make data storage and transmission more secure and reliable. Authentication and encryption are common tools used to secure data in DBMS.  
  • DBMS provides backup and restores functionality. It can be configured to automatically back up data and restore data in the database when needed.

This is one of the most frequently asked DBMS interview questions.  

There are various languages present in DBMS like Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Transaction Control Language (TCL), etc. 

  • Data Definition Language (DDL) - This language contains the commands that are required to define a database. Commands like CREATE, ALTER, DROP, TRUNCATE, RENAME, etc. 
  • Data Manipulation Language (DML) - This contains the commands necessary to manipulate data that resides in a database. Examples: SELECT, UPDATE, INSERT, DELETE, etc. 
  • Data Control Language (DCL) - Data Control Language contains the commands necessary to handle user privileges and control of the database system. For example: GRANT and REVOKE. 
  • Transaction Control Language (TCL) - Transaction Control Language contains the commands necessary for database transactions like COMMIT, ROLLBACK and SAVEPOINT. 

One big issue is accessing. If there is no indexing, the only option is a full-page scan, which is slow. Another problem is redundancy and inconsistency. The file contains a lot of duplicate and redundant data, and changing any of them can make all the data inconsistent. Also, keeping files consistent is very expensive. The next problem is the lack of concurrency control. As a result, an entire page can be locked in one operation, but the DBMS allows multiple operations on a single file.  

Data isolation, integrity checking, transaction atomicity, and security issues are other drawbacks of traditional file-based systems, for which DBMS has provided some excellent solutions. 

Expect to come across this popular DBMS viva question this time around.

A database model is the logical structure of a database that describes the relationships and constraints for storing and accessing data. There are some common database models. The models are:  

  • Relational Model 
  • Hierarchical Model 
  • Entity Relationship Model (ER) 
  • Document Model 
  • The Object-Oriented Model (OO) 

While choosing the database model we have to consider some factors. Each database model has its own strengths, although the application will vary to some extent. For example, the document model is suitable for text or semi-structured data. Whereas, if we have atomic data, the relational model is the best option. 

It also depends on the DBMS we are using. Many DBMSs are designed to work only with a particular model, which leaves the user with no other options. 

NoSQL refers to a group of databases created for a specific data model such as charts, documents, key pairs, wide columns, etc. NoSQL databases have a flexible schema which differs from relational databases. NoSQL databases are widely known for their ease of development, functionality and scalable performance. Unlike relational databases, many NoSQL databases can scale horizontally across sp many servers. 

NoSQL systems are considered very new compared to traditional relational databases. Still, many innovations and performance improvements have increased its popularity. 

In addition to all the benefits to these systems, we have to aware that NoSQL databases generally do not offer the same level of data integrity as relational databases. This is because NoSQL systems sacrifice ACID properties in favor of speed and scalability.

ACID is known for Atomicity, Consistency, Isolation, and Durability in DBMS. These are the properties that make sure that there is a secure way to exchange data between multiple users. Now, let’s get to their explanation. 

  • Atomicity – Atomicity is known as the rule of all or nothing. Basically, either all parts of the transaction are saved or none are saved. Partial trade is not allowed. For example, if a transaction takes money from one of his accounts and credits it to another, then all parts of the transaction must be completed to maintain database consistency. Partial execution of this transaction would leave the database in an inconsistent state. 
  • Consistency - There is no consensus on the definition of this term. A database should be consistent before a transaction and remain consistent after the transaction. 
  • Isolation - We have many transactions running at the same time, and the intermediate status of each transaction should not be visible to other transactions. For example, in the money transfer transaction, we talked about atomicity, other transactions must see money in one account or the other but not in the other. In other words, if we're running transactions in complete isolation from each other, they should appear to run serially, not concurrently. 
  • Durability - Once a transaction is successfully committed, it must be persisted (saved to disk) and cannot be undone in case of a crash.

A must-know for anyone heading into a database engineer interview, this is one of the most frequently asked DBMS interview questions. 

An execution plan is a description of the operations that the database engine plans to perform to efficiently execute a query. This can be thought of as looking at the DBMS query optimizer, which is essentially software, to find the most efficient way to implement the query. Execution plans are the primary means of troubleshooting poorly performing queries.  

Reading an execution plan, understanding it, and using it to troubleshoot is the most important skill. So,

SQL is not a procedural language, but it is a declarative language. Let’s take the example, basically, you tell the DBMS what you want, but you don't tell the DBMS how to get those results. It's up to the DBMS to understand this. A DBMS can apply multiple query strategies to get the correct result. However, these execution plans have different costs. It is the query optimizer's job to compare these strategies and choose the one with the lowest expected cost. Cost in this context is a weighted combination of input/output and processing costs. Input/output cost is the cost of accessing index and data pages from disk. 

There are many ways to improve query performance. Here are a few best practices:  

  • Avoid multiple joins in one query. 
  • Use joins in place of subqueries. 
  • Use stored procedures for commonly used data and more complex queries. 
  • Use a WHERE expression to limit result size as much as possible.

This is one of the most frequently asked DBMS interview questions for freshers. Here is how to answer this.

  • Entity: An entity is a real-world object with attributes that are just properties of that particular object. For example, a student can be an entity. This particular entity can have attributes like studentid, studentname, etc. 
  • Entity Type: An entity type is simply a collection of entities with the same attributes. An entity type generally references one or more related tables within a particular database. An entity type can therefore be understood as a characteristic that uniquely identifies an entity. For example, an employee can have attributes such as empid, empname, and department. 
  • Entity Set: An entity set is a collection of all entities of a particular entity type in the database. For example, a group of employees, a group of companies, and a group of people can be grouped into entity groups. 

Data abstraction in a DBMS is the process of hiding irrelevant details from the user. Database systems consist of complex data structures that give access to user interactions with the database. Let’s understand with an example. We know that most users prefer a system with a simple GUI without any complexity. Therefore, data abstraction should be performed to provide information to the user and facilitate access to the data. Additionally, data abstraction divides the system into different layers to specify and clearly define work. 

Here are the three levels of data abstraction:  

  • Physical Level – It is the lowest level of abstraction. This level describes how to store the data. 
  • Logical Level – The next highest level of abstraction. This describes the data stored in the database and the relationships between that data. 
  • View Level – It is the highest level of data abstraction. This explains only part of the whole database. 

Let’s understand in more detail with the help of an example. The user uses a GUI to interact with the system and enter the required details but has no idea how the data will be used. So, View Level provides a very high level of abstraction. The next level is for programmers which is Logical Level. Because at this level fields and records are visible and programmers have this level of knowledge. So, the level of abstraction for View Level here is a bit low. And finally, the Physical Level where the memory blocks are written. 

Data independence states that applications are independent of data storage structures and access strategies. This allows us to change the schema definition at one level without changing the schema definition at the next higher level. This ensures that changing the schema definition at one level does not affect the schema definition at the next higher level.  

There are different kinds of data independence. Let’s see them: 

  • Physical Data Independence – Physical data is the data that is stored in the database in bit-format. This data independence means changes at the physical level should not affect the logical level. For example, when working with the data on a table, do not change the format of the table. 
  • Logical Data Independence – Logical data within the data is about the database. It basically defines the structure of the data, similar to a table stored in a database. This independence means changes in logical level should not affect the view level. For example, if we need to change the format of a table, this change should not affect the data stored in the table.  

Relational algebra is a procedural query language containing a set of operations that take one or two relationships as input and create new relationships. Relational algebra is the basic set of operations for relational models. The essence of relational algebra is that it is similar to algebra for manipulating numbers. There are some basic operations in relational algebra which are select, project, set difference, union, rename, etc. 

Relational Calculus is a non-procedural query language which makes use mathematical predicate calculus contrary to algebra. Relational calculus does not work on mathematical underpinnings such as algebra, differentiation, and integration. Therefore, it is also called predicate calculus. There are two different types of relational calculus: the first one is Tuple relational calculus and other one is Domain relational calculus. 

the name of the table. A Relationship schema is known as a structure that can be used to describe how data is organized in tables. This structure contains no data. 

A Relation is defined as a set of tuples. A relation is a set of related attributes that identify a key attribute. For example, let r be the relation which contains set tuples (t1, t2, t3, ..., tn). Each tuple is an ordered list of n-values t= (v1, v2, ...., vn). 

The Degree of Relation is the number of attributes in that relation schema. The degree of relationship is also called cardinality. It is defined as the frequency of occurrence of one entity relative to the frequency of occurrence of another entity. There are three degrees of relationship: one-to-one (1:1), one-to-many (1:M), and many-to-one (M:M). 

The Relationship is known as the interrelation between two or more entities. There are different types of relationships between the tables in DBMS. Let’s discuss them in detail. 

  • One to One Relationship - This type of relationship applies when a particular row of table X is related to a single row of table Y. 

  • One to Many Relationship - This type of relationship applies when one row of table X is related to many rows of the table. 

  • Many to Many Relationship - This type of relationship applies when multiple rows of table X can be related to multiple rows of table Y. 

It is highly likely that you could come across this popular DBMS basic interview question. 

E-R Model is an acronym for Entity Relationship Model. It is seen that this model is based on real-world examples. It contains the required objects (called entities) and the relationships between those objects. Here, the main objects are an entity, an attribute of that entity, a set of relationships, and the attributes of that set of relationships can be mapped in the form of an E-R diagram.

In the E-R diagram, entities are represented by rectangles, relationships are represented by diamonds, attributes are entity features and are represented by ellipses, and data flows are represented by lines.

Intermediate

There is a major difference between intension and extension. Let’s discuss it one by one. 

  • Intension – Intension, also commonly known as the database schema, is used to define the description of the database. It is set during the database design, and it rarely changes afterwards. 
  • Extension – Extension is a measure of the number of tuples that exist in the database at a given point in time. A database extension also known as database snapshot. It changes its value whenever tuples in a database are created, updated, and deleted. 

Data is collected (extracted, transformed, loaded) from heterogeneous sources and stored in databases. A data warehouse can be thought of as a central repository that receives data from transactional systems and other relational databases. Large business data can be correlated to provide better insight into business performance. A data warehouse is the core of business intelligence, a data analysis and reporting system.

This database is maintained separately from the standard production database. These are two separate systems, the second system is optimized for fast and accurate updates of real-time data, while the first is primarily suitable for offline operation and provides long-term data view over the time.

Data Warehousing

In general, it is fair to say that locks are primarily used to ensure that only one user/session can update a particular piece of data. Two types of locks can be discussed here: shared locks (S) and exclusive locks (X). These locks can be held on tables, pages, index keys, or single rows. 

  1. Shared Lock - When an operation requests a shared lock on a table, the table is opened for reading, if permitted. This lock can be shared with other read operations to read the table concurrently. 
  2. Exclusive Lock - If the operation requires an exclusive lock on the table, you have exclusive write permission on the table, if allowed. Requesting access to this locked table will block other operations. 

There is another concept related to locks called intent (I) locks. There are intent shared (IS) locks and intent exclusive (IX) locks. These locks allow finer-grained concurrency control. Technically we don't need them. S and X locks are sufficient, but they help optimize queries. 

This is a regular feature in DBMS interview questions for freshers, be ready to tackle it.

The indexes are used to speed up the query process. Without them, the DBMS would have to do a full table scan, which would be very time-consuming.  

A clustered index points to the physical storage of data. We can basically ask the DBMS to sort the rows by columns and physically store them in that order. As we can see, only one clustered index can be created per table. A clustered index provides fast sequential scans, so data can be retrieved very quickly. We can either create a custom clustered index or let the DBMS automatically create one clustered index using the primary key.  

However, non-clustered indexes have nothing to do with physical storage. These indexes are sorted on columns and stored outside the table. We can think of these indexes as reference tables with two columns. One column is one ordered form of the table columns, and the other is the physical address (row address) in memory. For these indexes, when looking up a record, we can first find that index in the lookup table, then go to physical storage and get all the records associated with that row address. 

In summary, non-clustered indexes are slower than clustered indexes because they involve an extra search step. Additional storage space is also required as these lookup tables must be stored. Another difference is that we can have one clustered index per table, while we can have as many non-clustered indexes as we want.

With respect to inter-dependencies, there are two types of sub-queries. We know that one inner query depends on the value of the outer query. This type of query is called a “correlated” query, and the other inner and outer queries are independent and called a "non-correlated" query.

We can easily say that, correlated sub-queries are very slow because the inner sub-query must be executed once for each row of the outer query.

There are three main algorithms for performing JOINs:

  1. Nested Loop: This algorithm compares all values in the outer and inner tables with each other. It is the only algorithm that can perform cross-joins (many-to-many joins). This serves as a fallback option in the absence of good algorithms.  
  2. Hash Join: This algorithm is the most versatile join method. Because it builds an in-memory hash table for the smaller of the two inputs, reads the larger input, and goes through the in-memory hash table to find a match. Hash joins can only be used to compute equijoins. This is generally more efficient than nested loops, except when the probe side of the join is very small.  
  3. Sort-Merge Join: This algorithm first sorts the two tables based on their join attributes. Then find the first match and scroll down the two tables and merge the matching attribute rows. 

Expect to come across this popular DBMS interview question more often than others.

A staple in database interview questions, be prepared to answer this one.  

  • DELETE Command: This command is required to delete rows from a table based on the WHERE clause conditions. This command deletes only those rows for which the WHERE clause is specified. This command can be reset when necessary. This command also maintains a log protocol that locks table rows before deleting them, which slows things down.  
  • TRUNCATE Command: It is a command required to remove complete data from a table in the database. It's like a DELETE command without a WHERE clause. It deletes complete data from the table in a database. This command cannot be reset even if necessary (Truncation can be undone in some databases, depending on the version, but it is difficult and can lead to data loss.). Also, this command does not maintain any log protocol and removes the whole table at a time which is why it is fast.

A cursor is a tool that allows the user to return results row by row. This is in contrast to its typical SELECT statement which returns the entire result set. Although we are usually interested in the complete set of rows, applications, especially those which are interactive and online applications, do not always work effectively with the entire result set as a single unit. These applications prefer to process data row by row. Cursors allow us to perform this row-by-row operation.  

In the example below we can understand the whole process of using cursors. 

The Sharding is basically a horizontal split architecture. Each shard/split has the same schema and columns, but the rows are different and independent of each other.  

The main advantage of sharding is scalability. With an auto-sharding architecture, we can easily add machines to the stack as needed. Also, it is useful for offloading existing machines to handle more traffic and faster processing. This is attractive for growing applications.  

There are some advantages of database sharding: 

  • It solves scalability issues. 
  • It has high availability and speedy query response time. 
  • It has more write bandwidth.

One of the most frequently posed DBMS interview questions and answers, be ready for it. 

A checkpoint is a type of mechanism where all previous logs are deleted from the system and permanently stored on the storage disk.  

There are two ways a DBMS can help recover and manage ACID properties. That is, managing the log for each transaction and managing shadow pages. So, when it comes to log-based recovery systems, checkpoints occur. A checkpoint is a point to which the database engine can return after a crash as the specified minimum point at which committed data up to the time of the crash can be recovered using transaction log records.  

The checkpoint is like a picture of the DBMS state. Checkpoints allow the DBMS to reduce the work involved in restarting subsequent crashes. Checkpoints are used to restore the database after a system crash. Checkpoints are used in log-based recovery systems. If the system needs to be rebooted due to a system crash, use a checkpoint at that point. In other words, you don't have to start the transaction from scratch.

  1. Normalization: This normalization is the process of reducing redundancy by organizing data into multiple tables. Normalization improves disk space utilization and helps maintain database integrity.  
  2. Denormalization: This denormalization is the reverse process of normalization, combining normalized tables into a single table for faster data retrieval. A JOIN operation can be used to create a denormalized form of the data by reversing the normalization.  

There are majorly seven types of keys in the database. Let us discuss them in more detail as below. 

  1. Candidate Key: A candidate key shows a set of properties that can uniquely identify a table. Each table can have more than one candidate key. One key out of all candidate keys can be selected as the primary key. For example, studentId and firstName can be considered candidate keys because they can uniquely identify each tuple.  
  2. Super Key: A super key defines a set of attributes that can uniquely identify a tuple. Candidate keys and primary keys are subsets of super keys. So, the super key is a superset of them.  
  3. Primary Key: A primary key defines a set of attributes used to uniquely identify each tuple. For example, studentId and firstName are candidate keys and either can be selected as the primary key. Here, studentId is the primary key. 
  4. Unique Key: A unique key is very similar to a primary key, except that a primary key does not allow null values in a column, whereas a unique key does. So, the unique key is basically the primary key with its NULL values. 
  5. Alternate Keys: All candidate keys not selected as primary keys are considered alternate keys. In the example, firstname and lastname are alternate keys in the database. 
  6. Foreign Key: A foreign key defines an attribute that can only retrieve values that exist in one table, along with attributes that exist in another table. In the example, courseId in the Student table is a foreign key to the Course table because both tables contain courseId as one of their attributes.  
  7. Composite Key: A composite key refers to the combination of two or more columns that can uniquely identify each tuple in a table. In the example, we can group studentId and firstname to uniquely identify each tuple in the table. 

Expect to come across this popular DBMS viva question, as this tests your basic concepts.

A deadlock is a situation that occurs in the operating system when one process enters a wait state while another waiting process is holding a requested resource. This is a common problem in multiprocessing where multiple processes share certain kinds of mutually exclusive resources called soft-locks.

Example of deadlock:

A real-world example is traffic going in only one direction. Consider the bridges count as a resource here. This is a simple way to solve deadlocks when the car is in reverse (resource pre-allocation and rollback). In the event of a deadlock situation, it may be necessary to reserve a large number of cars. You may be starving at this point.

A DBMS timestamp-based protocol is an algorithm that uses the system time or a logical counter as a timestamp. It helps to serialize the execution of concurrent transactions. This protocol ensures that conflicting reads and writes are performed in timestamp order.  

For example: Suppose we have three transactions such as X1, X2, X3. 

  • X1 entered the system at 0001 
  • X2 entered the system at 0002 
  • X3 entered the system at 0003 
  • In this case, transaction X1, transaction X2 and transaction X3 take precedence. 

Here is the list of some restrictions that can be applied while creating a view: 

  • Only the database which is in use can have views. 
  • There is no need to change the calculated value in a particular view.  
  • Integrity constants determine INSERT and DELETE functionality. 
  • Full-text index definition is not applicable. 
  • It cannot create a temporary view. 
  • A temporary table cannot have a view. 
  • No assignment to DEFAULT definition. 
  • Triggers such as INSTEAD OF are associated with views.

Advanced

Partitioning is the process of dividing a very large table into several smaller manageable pieces. Benefits of partitioning include faster queries, faster data loading, and faster purging of old data. The benefits of partitioning are limited by your choice of partition key and granularity.

There are two ways to split a table: horizontally and vertically. Vertical partitioning places different columns in different partitions, while horizontal partitioning places subsets of rows in different partitions based on the partition key. For example, company sales data can be split horizontally based on sales date.

Row based database stores the data on disk row-by-row, whereas columnar database stores the data column by column. Each method has its own advantages. The row-based is very fast and efficient for the operations on rows and column based is fast and efficient for the operations on columns, for example aggregating large volumes of data for a subset of columns.

The operations that need the complete row are writing operations for example INSERT, DELETE, UPDATE, etc. The operations that need columns are typically read operations for example SELECT, GROUP BY, JOIN, etc. From the above we can say that columnar database is good for analytical operations and row database is good for transaction processing.

Both OLTP and OLAP are online processing systems. OLTP stands for Online Transaction Processing and it is a system for managing transaction-oriented applications. OLAP stands for Online Analytical Processing and it is a system for managing analytical queries.

The main difference between the two systems is that OLTP is a write-intensive system and OLAP is a read-intensive system. This difference has significant implications for implementation. For example, it is very important to employ good concurrency control in an OLTP system, but for read-intensive operations this is not a big issue. Another difference between the two systems is that OLTP queries are very simple and return a relatively small number of records, whereas OLAP queries are very complex with many complex joins and aggregations.  

Another difference is that OLTP systems, due to their real-time nature, often follow a distributed architecture to avoid single points of failure, whereas OLAP systems use a centralized architecture.  

Besides, in most DBMS, OLTP is a row-based database and OLAP is a column-based database.

Isolation is the third characteristic of the ACID property. With this property, our goal is to make all transactions completely separate from each other (serializable). However, some applications do not require complete isolation. Therefore, we define some isolation levels that are less strict than full isolation. Generally, there are five isolation levels defined.  

  1. Read Uncommitted: There are no locks at all. Concurrent transactions can read or write uncommitted data from other transactions. In database languages, the DBMS is said to allow dirty reads.  
  2. Read Committed: In this DBMS does not allow dirty reads at this level. At this level, each transaction holds a read/write lock on the current row and releases it only when it commits changes. This isolation level still allows non-repeatable reads. In other words, if a transaction reads the same row, the transaction will return different values. See the following image for more clarity. Obviously, dirty reads are not allowed, but there are still non-repeatable reads. 

  • Repeatable Read: As explained earlier, the problem with read committed isolation level was non-repeatable reads. To avoid non-repeatable reads, each transaction should hold read locks on the rows it reads and write locks on the rows it writes (insert, update, delete, etc.) until it commits the changes. This isolation level is a repeatable read. However, this isolation level has one more scenario that makes the database inconsistent. Inserting or deleting new rows into a table and then running a range query gives inconsistent results. See the example below. The same query in Transaction 1 returns two different results. This scenario is called Phantom Reading.

  • Serializable: Serializable level is the highest isolation level. As we saw in Repeatable Reads, phantom reads can occur. To prevent phantom reads, locks should be held on the entire table, not on the row. The following diagram shows an overview of all previous isolation levels.  

  • Snapshot: This level of isolation is different from the other isolation levels discussed so far. Others were based on locks and blocks. It does not use locks. With this isolation level, when a transaction modifies a row (that is, inserts, updates, and deletes), the committed version of the modified row is copied to the temporary database (tempdb) and versioned. This is also known as row versioning. Then, when another session tries to read the modified object, tempdb returns the committed version of the object to that operation.  

If talking about snapshot isolation sounds fundamentally different than other isolation levels, then it is. Snapshot isolation is based on an optimistic model, while other isolation levels are based on a pessimistic concurrency control model. The optimistic model assumes conflicts are rare, chooses not to prevent conflicts, and deals with conflicts when they do occur.  

A must-know for anyone heading into the technical round, this is a frequently asked Database interview question. 

One of the most common DBMS viva questions, don't miss this one. Here is how you can explain these terms -

  • Lock Escalation: Database locks can be on rows, pages, or entire tables or indexes. If a transaction is in progress, locks held by the transaction consume resources. In lock escalation, the system aggregates multiple locks at a higher level (for example, multiple row locks on a single page or multiple pages on an entire table), usually in large numbers. Recover resources occupied by granular locks.  
  • Lock Contention: A lock conflict occurs when multiple operations request exclusive locks on a table. In this scenario, the operation must wait in a queue. If you have chronic lock contention, it means that some part of your database is getting hot. These data blocks should be subdivided to allow more operations to acquire exclusive locks at the same time. Lock contention can become a bottleneck when scaling up your database.  
  • Deadlock: A deadlock is a situation in which some transactions wait indefinitely for each other to release locks. There are mainly two ways to solve this issue. One is the lazy method. That is, do nothing, recognize when a problem occurs, and resume operations to resolve deadlocks. The other approach is proactive, meaning no deadlocks. 

Hashing is a lookup technique. Basically, it is how to map keys to values. A hash function converts a string into a fixed-length value (usually shorter). This value can be used as an index to store the original element.  

Hashing can be used to index and search for items in a database within a given amount of time using the appropriate hash function. This is faster than other search techniques. 

Advantage of Hashing: 

  • Hash tables are ideal data for point searches (aka equality queries), because they can search, insert, and delete data in constant time regardless of the input size and structure. 

Disadvantages of Hashing: 

  • Hashing may not always be the best option. For small data, for example, the cost of a good hash function makes hashing more expensive than a simple sequential lookup. 
  • Another situation is range scan operations (aka range queries). A B+ tree is the ideal data structure for this operation. 
  • Next situation is when looking for a substring or prefix match. Hashes are essentially useless for these operations.  
  • Another drawback of hashing is scalability. Hash table performance degrades as the database grows (more collisions, higher collision resolution costs).  

A staple in database interview questions for experienced, be prepared to answer this one.  

B+ Tree is a data structure of the B-Tree family. This data structure and its variants are very popular for indexing purposes. This tree is known as a self-balancing tree and provides a mechanism to ensure that the nodes are at least half full. In a B+ tree, data is stored in leaf nodes, which are sequentially linked. These sequential links between leaf nodes allow sequential access to data without traversing the tree structure. This sequential access allows for quick and efficient range scans.  

B+ trees allow search, sequential access, insertion, and deletion in logarithmic time. At the end of this answer, there is an example of a B+ tree visualization. Also, for better understanding, we can create the chart ourselves using various visualization tools.  

Database systems typically compare a B+ tree data structure to a hash table. Here I'm trying to explain the pros and cons of B+ trees and hash tables. The advantage of B+ trees is in range queries and substring searches with the LIKE command. Hash indexes for equality queries, on the other hand, outperform B+ trees. Another advantage of B+ trees is that they scale easily with the data, making them suitable for storing enormous amounts of data on disk.  

A common follow-up question is always asked that is, what is the difference between B+ trees and Binary Search Trees (BST)? A B+ tree is a generalization of BST that allows tree nodes to have more than two children.  

If someone asks you about the difference between B-trees and B+ trees, there are two things you can say. First, in B+ tree, records are stored only in leaves, and internal nodes store pointers (keys). Unlike B+ trees, B-trees can store keys and records in both internal and leaf nodes. Second, the leaf nodes of the B+ tree are linked together but not in the B tree. You can see the difference between these two in the example below.

We can think of it as a semi-program. A series of SQL statements to perform a specific task. If this task is a common task, we can store this query in a procedure and run it on demand instead of running it every time. Below is the simple structure of the procedure.

CREATE PROCEDURE <Procedure-Name> AS 

Begin 

<SQL STATEMENTS> 

End 

After creating the procedure, we can execute it with the EXECUTE command as needed. 

EXECUTE <Procedure-Name> 

Stored procedures have many advantages. Most important is reusability of SQL code. If the procedure is used frequently, it helps not to write the code multiple times. Another advantage is that using a distributed database reduces the amount of information sent over the network.  

Triggers are stored procedures that are automatically executed before and after an event occurs. These events are database operations such as DML, DDL, DCL, or LOGON/LOGOFF. The trigger syntax is:  

CREATE [ OR ALTER ] TRIGGER [ Trigger-Name ]  
[BEFORE | AFTER | INSTEAD OF]  
{[ INSERT ] | [ UPDATE ] | [ DELETE ]}  
ON [table-name]  
AS 
{SQL Statement} 

The uses of triggers include validating transaction validity, enforcing referential integrity, logging events, auto-generating some derived columns, and security authentication before and after user login.  

When asked about the difference between a trigger and a stored procedure, the answer is that a trigger cannot be called by itself. These are called during events. By contrast, stored procedures are independent queries and can be called independently.  

A staple in DBMS interview questions and answers, be prepared to answer this one. 

As we know, there are many levels of normalization. The purpose of normalization is to avoid redundancies and dependencies. However, this goal cannot be achieved in one step. With each normalization step (type), we get closer to the goal. 

Let’s start with Unnormalized Data Format (UNF). UNF divides these values into multiple cells so that each table cell has a single value and removes duplicate rows so that every record is unique, so the cells have multiple values (non-atomic). We then converted the UNF to First Normal Form (1NF). 

The next step is to introduce primary and foreign keys. In this step, we subdivide the 1NF table into new tables and join them by primary and foreign keys. In these new tables, we arrive at Second Normal Form (2NF) if all non-key attributes work perfectly according to the primary key.  

2NF has greatly reduced redundancies and dependencies, but there is still a lot of room for improvement. The next step is to remove transitive function dependencies. This basically means that changing one non-key column may change another non-key column. To unwrap these non-key columns, you need to create a separate table. The third normal form (3NF) is reached when there are no transitive functional dependencies.  

In nearly all databases, 3NF is the point at which the database cannot be decomposed into higher normalized forms. However, in some complex databases there are situations where a higher form of normalization can be achieved. Higher-order normalization forms are Boyce-Codd Normal Form (BCNF), 4NF, 5NF, and 6NF. 

Two-tier Architecture

A two-tier architecture corresponds to a basic client-server architecture. A two-tier architecture allows a client-side application to communicate directly with a server-side database. For example, the Railway Reservation System, Contact Management System, etc. These are common examples of two-tier architecture.

Three-tier Architecture:

A three-tier architecture includes an additional layer between the client and server. The introduction of the 3-tier architecture is for ease of use as it makes the system secure and provides a GUI that makes it more accessible. In this architecture, client-side applications interact with applications on the server, which in turn communicate with the database system. For example, Registration Form Designing which includes text-box, label, button or a website from internet, etc.

This is one of the most frequently asked database interview questions. 

There are different parameters, and using those parameters logical and physical database designs can be differentiated. Those parameters include task, choice of criteria, and result. Now, let us see the difference between logical and physical database designs in more detail based on these parameters. 

Logical Database System: The task of logical database design is to map or convert conceptual schemas from high-level data models to relational database schemas. This mapping can be done in two stages. The first stage is mapping is system-independent, but the data model is dependent, and the second stage is schema turns for specific DBMS. In this database design, DDL statements in the language can be your DBMS of choice. You can specify schemas at the conceptual and external levels of your database system. However, if the DDL statements contain some physical design parameters, the complete DDL specification should wait until the physical database design phase is completed. 

Physical Database Design: The main task of physical database design is to store the database specifications in terms of physical storage structure, record placement, and indexes. The selection of physical database design is based on some criteria, which include response time, disk space utilization, and transport throughput. The initial determination of database file storage structure and access paths. This corresponds to the internal schema definition for the Data Storage Definition Language. 

Database design is divided into many distinct phases. Logical database design and physical database design are two of them. This separation is generally based on the concept of a three-tier architecture of DBMSs that provides data independence. This separation, therefore, leads to data independence, since the output of the logical database design is the conceptual and external level schema of the database system independent from the output of the physical database design, which is the internal schema.  

Expect to come across this, one of the most popular DBMS interview questions on database design.

Here is the list of different types of failures: 

  • Statement Failure 
  • Bad Datatype  
    • Insufficient Space 
  • Insufficient Privileges 
  • User Process Failure 
    • An abnormal disconnect performed by the user. 
    • Abnormal termination of user’s session. 
    • An address exception is raised by user’s program. 
  • User Error 
    • When the user drops a table. 
    • Data damage happens when the user does the modification. 
  • Instance Failure 
  • Media Failure 
  • Alert Logs 
    • When it records informational and error messages. 
    • The log records all instance startups and shutdowns.

The RAID is an acronym for Redundant Array of Inexpensive (or Independent) Disks. 

RAID is a method of combining multiple hard drives into one logical entity (grouping two or more hard drives so that they appear as one device to the host system). RAID technology was developed to overcome the fault tolerance and performance limitations of traditional hard disk storage. It can provide fault tolerance and higher throughput than a single disk or group of independent disks. Once considered a complex and relatively specialized storage solution, arrays are now easy to use and essential for a wide range of client/server applications.  

Even if the system crashes before all changes are written to disk, the impact of the transaction remains if the DBMS reports to the user that the transaction completed successfully.

Durability is the word used for this quality. As transactions are written to the database, this durability feature makes sure that the data is stored in non-volatile memory and is immune to system failures. This is the reason this feature is very important in DBMS.

Description

Top Database Management System Tips and Tricks

Earlier, we talked about the DBMS interview questions and answers for the job roles in this Database Management field. There are some tips and tricks that one must follow for efficient and productive work in managing large database systems. Below is the list of database viva questions, tips and tricks: 

  • Follow Your Own Safety Instructions: This means that monitoring and other data collection solutions will not run as SA or Administrator. Use role-based, LDAP, or other more advanced security approaches. This means backing up all data, especially if your database management data contains sensitive data. This is especially true when capturing queries and parameters.  
  • Don’t Spread Data Everywhere: Tracking the location and content of database management data requires a well-managed and possibly centralized point. If it is in a spreadsheet, random table, proprietary data store, XML file, CSVs, etc., it becomes very costly and difficult to get answers.  
  • Collect What you Need Only When you Need it: Just because you can collect some data doesn't mean you should collect it or collect it many times. Do we need to collect it apart from legal data collection requirements? For example, we always use profiler traces for all events instead of just the events we need when we need them.
  • Know What’s Worth Collecting: This means you need to understand what each metadata element in your system represents. Are all stats since the last service restart? Since they were turned on? Only for the last two weeks? 2 hours? Does it really do what the name says? Understanding the data requires good training and learning.  
  • Establish a Good Archiving Strategy: Most of the time, storage is free, but when it is not, we need a good archiving strategy because collected data also comes with all sorts of costs, such as security, archiving, backup, recovery, management, etc.  
  • Don’t Alert on Every Event: Alert burnout is dangerous. If you have too many alerts, then you easily learn how to turn them off or ignore them without paying attention to serious alerts. We all do this. Don't be the person whose phone rings with a new alert every two minutes.  
  • Backup/Restore: Hardware can fail due to the physical nature of data or script. You should also have a backup copy of your database data on hand. No exaggeration. It does the right thing.  
  • Here are some tips for modern database management: 
    • Set definite business goals. 
    • Establish good policies and procedures, including backup and recovery processes. 
    • Make security your main priority. 
    • Focus on the good quality of data. 
    • Try to reduce duplicate data. 
    • Make the data easily accessible. 

How to Prepare for Database Management System (DBMS) Interview?

It is very important to study DBMS for placement preparations because it is asked in almost all technical interview rounds. Also, tech companies ask questions related to database management systems. Nowadays, there are several job roles being offered by the top companies for Database Management. Learn about these job roles and study the necessary skills for these roles.  

For the smooth dbms interview questions for these job roles, you must have a thorough knowledge of the Database Management System. You must study all concepts of DBMS, like DBMS queries interview questions or DBMS and SQL interview questions. To learn better, here are some concepts that will clarify where you should start: Introduction (includes transaction concurrency and deadlocks), ACID properties in DBMS, Concurrency Control Protocols, Lock- based Protocols, etc. Prepare DBMS interview questions and answers related to these topics. Also, refer to the most asked DBMS interview questions from this article. If you want to know more about DBMS, check out the Computer Programming course. 

Top Companies Hiring for these DBMS Job Roles

  • Epsilon 
  • TCS Cognizant 
  • Infosys 
  • Tech Mahindra 
  • Capgemini 
  • Media.net 
  • Novartis 
  • Accenture etc. 

What to Expect in DBMS Interview?

The interviewer will check your knowledge and understanding of database management systems. Your experience with DBMS job roles will help you in many ways for the DBMS job roles. The interviewer will ask you questions related to the core topics of DBMS. These will include the following topics: 

  • Database Basics 
  • Architecture and Models 
  • ER Model 
  • Relational Database Model 
  • Relational Algebra 
  • Relational Calculus 
  • Functional Dependencies 
  • Normalization 
  • Transaction and Concurrency Controls 
  • SQL 
  • Queries 

Besides these conceptual questions the interviewer will check your skills, and understanding of the job role and the organization. Be confident and true while answering such questions. 

Summary

A database management system (DBMS) is a collection of tools that can be used to create and manage databases. A DBMS is a set of applications that facilitate the creation and management of databases.  

Currently, the market is generating approximately 2.5 trillion bytes of data every day. Therefore, a database management system (DBMS) must be used to analyze this data and produce the required results. Data engineering includes all these aspects of designing, building, and transforming data pipelines to make them easier for data science engineers and big data engineers to use. So, in this article, we discussed DBMS normalization interview questions, DBMS interview questions. These questions will help you to ace the interview and to make a better understanding of the topics.  

This article contains questions that are categorized based on your experience, such as: 

  • Freshers (DBMS Basic Interview Questions) 
  • Intermediate (DBMS Intermediate Interview Questions) 
  • Experienced (DBMS Advanced Interview Questions) 

We discussed various job-related positions in the database management field and which are the top companies that are offering them. In this article, we covered DBMS interview questions for Infosys and TCS-like companies. This will help you to prepare for a particular position. Database course is a good resource if you want to learn more about NoSQL databases and basics. 

In this article, we shed some light on what to expect in a database interview questions and learned about how to prepare for those interviews. Refer to the DBMS interview questions and answers to perform well in the interview.

Read More
Levels