MySQL
MySQL
1. How to Identify Slow Queries in MySQL
Answer:
During our stress testing, we noticed that some API responses were very slow, exceeding 2 seconds. Since our system integrates with SkyWalking/Prometheus for monitoring, its reports helped identify slow APIs and pinpoint which part of the API (including SQL execution) was slow.
If such monitoring systems are unavailable, MySQL itself provides a slow query log feature. You can enable it in the MySQL configuration file and set a threshold (e.g., 2 seconds). Queries exceeding this time will be logged, making it easier to identify slow queries.
2. How to Analyze Slow Queries
Answer:
If a query is slow, we use the EXPLAIN command to analyze its execution plan. Key metrics include:
- Key and key_len: Check if the query uses indexes and whether they are effective.
- Type: Review the scan type (e.g., full table scan or range scan).
- Extra: Look for optimizations (e.g., avoid table backtracking or unnecessary sorting).
For example, if backtracking occurs, we might add indexes or optimize the returned fields to improve performance.
3. What Is an Index?
Answer:
Indexes are data structures that help MySQL retrieve data efficiently. They reduce I/O costs, enable faster lookups, and can also help sort data, reducing CPU usage.
4. Underlying Data Structure of Indexes
Answer:
The InnoDB engine uses B+ trees for index storage.
Advantages of B+ trees:
- More child nodes per node, reducing tree depth.
- Lower disk I/O since non-leaf nodes store only keys and pointers.
- Suited for range queries because leaf nodes form a doubly linked list.
5. Differences Between B-Trees and B+ Trees
Answer:
- Data storage:
- B-Trees store data in both internal and leaf nodes.
- B+ Trees store all data in leaf nodes, ensuring more consistent query performance.
- Range queries:
- B+ Trees excel in range queries as leaf nodes form a linked list.
6. Clustered vs. Non-Clustered Indexes
Answer:
- Clustered Index:
Data and index are stored together. In B+ trees, leaf nodes store entire rows. Each table usually has one clustered index, typically the primary key. - Non-Clustered Index:
Data and index are stored separately. Leaf nodes store primary key values instead of full rows. Tables can have multiple non-clustered indexes.
7. What Is Backtracking in Queries?
Answer:
Backtracking occurs when a secondary index points to the primary key, requiring an additional lookup in the clustered index to retrieve full row data.
8. What Is a Covering Index?
Answer:
A covering index is one where all the fields required for a query are included in the index. This avoids backtracking and improves query performance.
9. How to Handle Large Pagination?
Answer:
For large datasets, using LIMIT
with sorting can be slow. Optimize with:
- Covering indexes: Use indexed columns for pagination queries.
- Subqueries: First fetch primary key IDs for the required page, then query using these IDs.
10. Principles for Index Creation
Answer:
- Create indexes for tables with over 100,000 rows.
- Index frequently queried columns (e.g., filters, sorting, grouping).
- Prefer composite indexes that cover query results.
- For fields with low cardinality, place them later in composite indexes.
- Use prefix indexes for long fields.
- Limit the number of indexes to balance read/write performance.
11. When Do Indexes Fail?
Answer:
Indexes fail in cases such as:
- Ignoring the leftmost prefix principle in composite indexes.
- Using wildcard queries starting with
%
. - Performing operations or type conversions on indexed fields.
- Using a range query in a composite index, making subsequent conditions ineffective.
12. SQL Optimization Techniques
Answer:
Optimize SQL by:
- Selecting specific fields instead of
SELECT *
. - Using
UNION ALL
instead ofUNION
. - Using
INNER JOIN
for table joins and ensuring smaller tables are driving tables when usingLEFT JOIN
orRIGHT JOIN
. - Leveraging read-write separation and replication for large datasets.
- Sharding large tables across databases.
13. Table Creation Optimization
Answer:
We followed the Alibaba Development Handbook to choose appropriate field types. For instance:
- Use
TINYINT
,INT
, orBIGINT
for numeric fields. - Use
CHAR
,VARCHAR
, orTEXT
for strings based on size and variability.
14. Index Usage Optimization
Answer:
We follow best practices for index creation, ensuring frequent query columns are indexed, avoiding operations on indexed fields, and controlling the total number of indexes.
15. Personal SQL Optimization Practices
Answer:
- Avoid
SELECT *
; specify required columns. - Prevent index failure by writing query-friendly syntax.
- Prefer
UNION ALL
overUNION
. - Use
INNER JOIN
for better performance in joins.
16. What Are Transaction Properties?
Answer:
Transactions follow ACID principles:
- Atomicity: All operations succeed or fail as a whole.
- Consistency: Data remains consistent across transactions.
- Isolation: Concurrent transactions don't interfere.
- Durability: Changes persist after commit.
17. Issues with Concurrent Transactions
Answer:
Concurrency can lead to:
- Dirty reads: Reading uncommitted changes.
- Non-repeatable reads: Different values for the same query in one transaction.
- Phantom reads: Rows appearing or disappearing during a transaction.
18. Solving Concurrency Issues
Answer:
MySQL uses isolation levels:
- READ UNCOMMITTED: Allows all issues.
- READ COMMITTED: Solves dirty reads.
- REPEATABLE READ: Solves dirty and non-repeatable reads (default in MySQL).
- SERIALIZABLE: Prevents all issues but reduces performance.
19. Difference Between Undo Log and Redo Log
Answer:
- Redo Log: Tracks physical changes for recovery after a crash, ensuring durability.
- Undo Log: Tracks logical changes for rollbacks, ensuring atomicity.
20. Ensuring Isolation with MVCC
Answer:
MVCC uses hidden fields (trx_id
, roll_pointer
), undo logs, and a read view for consistent snapshots.
21. MySQL Master-Slave Synchronization
Answer:
Master-slave replication works as follows:
- Master records changes in a binary log (Binlog).
- Slave reads the Binlog and writes changes to its relay log.
- Slave replays the relay log to sync data.
22. Vertical vs. Horizontal Partitioning
Answer:
- Vertical Partitioning: Splitting databases by functionality (microservices architecture).
- Horizontal Partitioning: Splitting large tables across databases (e.g., sharding by ID).