MySQL

Ruochen ChenNovember 18, 2024About 3 min

MySQL

1. How to Identify Slow Queries in MySQL

Answer:
During our stress testing, we noticed that some API responses were very slow, exceeding 2 seconds. Since our system integrates with SkyWalking/Prometheus for monitoring, its reports helped identify slow APIs and pinpoint which part of the API (including SQL execution) was slow.

If such monitoring systems are unavailable, MySQL itself provides a slow query log feature. You can enable it in the MySQL configuration file and set a threshold (e.g., 2 seconds). Queries exceeding this time will be logged, making it easier to identify slow queries.

2. How to Analyze Slow Queries

Answer:
If a query is slow, we use the EXPLAIN command to analyze its execution plan. Key metrics include:

Key and key_len: Check if the query uses indexes and whether they are effective.
Type: Review the scan type (e.g., full table scan or range scan).
Extra: Look for optimizations (e.g., avoid table backtracking or unnecessary sorting).

For example, if backtracking occurs, we might add indexes or optimize the returned fields to improve performance.

3. What Is an Index?

Answer:
Indexes are data structures that help MySQL retrieve data efficiently. They reduce I/O costs, enable faster lookups, and can also help sort data, reducing CPU usage.

4. Underlying Data Structure of Indexes

Answer:
The InnoDB engine uses B+ trees for index storage.
Advantages of B+ trees:

More child nodes per node, reducing tree depth.
Lower disk I/O since non-leaf nodes store only keys and pointers.
Suited for range queries because leaf nodes form a doubly linked list.

5. Differences Between B-Trees and B+ Trees

Answer:

Data storage:
- B-Trees store data in both internal and leaf nodes.
- B+ Trees store all data in leaf nodes, ensuring more consistent query performance.
Range queries:
- B+ Trees excel in range queries as leaf nodes form a linked list.

6. Clustered vs. Non-Clustered Indexes

Answer:

Clustered Index:
Data and index are stored together. In B+ trees, leaf nodes store entire rows. Each table usually has one clustered index, typically the primary key.
Non-Clustered Index:
Data and index are stored separately. Leaf nodes store primary key values instead of full rows. Tables can have multiple non-clustered indexes.

7. What Is Backtracking in Queries?

Answer:
Backtracking occurs when a secondary index points to the primary key, requiring an additional lookup in the clustered index to retrieve full row data.

8. What Is a Covering Index?

Answer:
A covering index is one where all the fields required for a query are included in the index. This avoids backtracking and improves query performance.

9. How to Handle Large Pagination?

Answer:
For large datasets, using LIMIT with sorting can be slow. Optimize with:

Covering indexes: Use indexed columns for pagination queries.
Subqueries: First fetch primary key IDs for the required page, then query using these IDs.

10. Principles for Index Creation

Answer:

Create indexes for tables with over 100,000 rows.
Index frequently queried columns (e.g., filters, sorting, grouping).
Prefer composite indexes that cover query results.
For fields with low cardinality, place them later in composite indexes.
Use prefix indexes for long fields.
Limit the number of indexes to balance read/write performance.

11. When Do Indexes Fail?

Answer:
Indexes fail in cases such as:

Ignoring the leftmost prefix principle in composite indexes.
Using wildcard queries starting with %.
Performing operations or type conversions on indexed fields.
Using a range query in a composite index, making subsequent conditions ineffective.

12. SQL Optimization Techniques

Answer:
Optimize SQL by:

Selecting specific fields instead of SELECT *.
Using UNION ALL instead of UNION.
Using INNER JOIN for table joins and ensuring smaller tables are driving tables when using LEFT JOIN or RIGHT JOIN.
Leveraging read-write separation and replication for large datasets.
Sharding large tables across databases.

13. Table Creation Optimization

Answer:
We followed the Alibaba Development Handbook to choose appropriate field types. For instance:

Use TINYINT, INT, or BIGINT for numeric fields.
Use CHAR, VARCHAR, or TEXT for strings based on size and variability.

14. Index Usage Optimization

Answer:
We follow best practices for index creation, ensuring frequent query columns are indexed, avoiding operations on indexed fields, and controlling the total number of indexes.

15. Personal SQL Optimization Practices

Answer:

Avoid SELECT *; specify required columns.
Prevent index failure by writing query-friendly syntax.
Prefer UNION ALL over UNION.
Use INNER JOIN for better performance in joins.

16. What Are Transaction Properties?

Answer:
Transactions follow ACID principles:

Atomicity: All operations succeed or fail as a whole.
Consistency: Data remains consistent across transactions.
Isolation: Concurrent transactions don't interfere.
Durability: Changes persist after commit.

17. Issues with Concurrent Transactions

Answer:
Concurrency can lead to:

Dirty reads: Reading uncommitted changes.
Non-repeatable reads: Different values for the same query in one transaction.
Phantom reads: Rows appearing or disappearing during a transaction.

18. Solving Concurrency Issues

Answer:
MySQL uses isolation levels:

READ UNCOMMITTED: Allows all issues.
READ COMMITTED: Solves dirty reads.
REPEATABLE READ: Solves dirty and non-repeatable reads (default in MySQL).
SERIALIZABLE: Prevents all issues but reduces performance.

19. Difference Between Undo Log and Redo Log

Answer:

Redo Log: Tracks physical changes for recovery after a crash, ensuring durability.
Undo Log: Tracks logical changes for rollbacks, ensuring atomicity.

20. Ensuring Isolation with MVCC

Answer:
MVCC uses hidden fields (trx_id, roll_pointer), undo logs, and a read view for consistent snapshots.

21. MySQL Master-Slave Synchronization

Answer:
Master-slave replication works as follows:

Master records changes in a binary log (Binlog).
Slave reads the Binlog and writes changes to its relay log.
Slave replays the relay log to sync data.

22. Vertical vs. Horizontal Partitioning

Answer:

Vertical Partitioning: Splitting databases by functionality (microservices architecture).
Horizontal Partitioning: Splitting large tables across databases (e.g., sharding by ID).