A detailed explanation of SQL analysis and application

The database as the core basic component is the object that needs to be protected. Any inadvertent operation on the line may cause serious failures to the database, thereby causing huge losses to the business.

In order to avoid such losses, efforts are generally made in management, such as the development of database development specifications for R & D personnel; the newly-launched SQL requires DBA review; maintenance operations need to be approved by leaders, etc. And if you want to be able to manage these measures effectively, you need effective database training, and you need to carefully conduct SQL audits by the DBA. Many small and medium-sized start-up companies can manage the database by setting specifications, conducting training, and improving the review process.

With the continuous development and growth of Meituan Dianping's business, the implementation cost of the above measures is getting higher and higher. How to rely more on technical means to improve efficiency has attracted more and more attention. There are many tools in the industry, such as SQL auditing and optimization suggestions based on MySQL source code, which greatly reduce the SQL auditing burden of DBA. So can we continue to expand the source code of MySQL to assist DBAs and R & D personnel to further improve efficiency? For example, more comprehensive SQL optimization functions; multi-dimensional slow query analysis; auxiliary fault analysis. To achieve the above functions, one of the core technologies is SQL parsing.

Status and scene

SQL parsing is a complex technology, which is generally controlled by database vendors. Of course, some companies provide APIs for SQL parsing.

Due to the rise of MySQL database middleware in recent years, it is necessary to support functions such as read-write separation, sub-database and sub-table, etc., it is necessary to extract the table name, database name and related field values ​​from SQL. Therefore, like Druid written in Java, MaxScale written in C, Kingshard written in Go, etc., SQL will be partially parsed. But there are few products that really use SQL parsing technology for database maintenance, mainly as follows:

Meituan commented on the open source SQLAdvisor. It is based on MySQL's original ecological lexical analysis, combined with the analysis of where conditions, aggregation conditions, and multi-table Join relationships in SQL to give index optimization suggestions.

The above products have very suitable application scenarios and are widely used in the industry. However, the application scenarios of SQL parsing are far from being fully explored, such as:

Slow query report based on table granularity. For example, a Schema contains data tables that belong to different business lines. From the perspective of the business line, it hopes to provide slow query reports with table granularity.

Generate SQL features. Replace the value in the SQL statement with a question mark to facilitate SQL classification. Although you can use regular expressions to achieve the same function, but there are many bugs, you can refer to pt-query-digest. For example, in pt-query-digest, all the numbers encountered will be replaced with "?", Which makes it impossible to distinguish tables with different number suffixes.

Confirmation and avoidance of high-risk operations. For example, DBA accidentally drops data tables, and for such operations, there is currently no effective tool to roll back, especially large tables, and the consequences will be catastrophic.

SQL legality judgment. For security, audit, control and other reasons, Meituan Dianping will not let R & D personnel directly operate the database, but provide RDS services. Especially for data changes, the superior supervisor of the R & D personnel needs to approve the business. If the R & D staff writes a SQL with incorrect syntax, and RDS cannot judge whether the SQL is legal, it will cause unnecessary communication costs.

Therefore, in order to allow all businesses in need to easily use the SQL parsing function, we believe that it should have the following characteristics:

Directly expose the SQL parsing interface and use it as simple as possible. For example: input SQL, then output the table name, characteristics and optimization suggestions.

The use of the interface does not depend on a specific language, otherwise the cost of maintenance and use is too high. For example: provide services through HTTP and other methods.

The journey of a thousand miles begins with one step. Let me first introduce the principle of SQL analysis.

principle

SQL parsing and optimization belong to the category of compilers, and there is no essential difference from the parsing of other languages ​​such as C language. Among them are lexical analysis, grammatical and semantic analysis, optimization, and execution code generation. The part corresponding to MySQL, as shown below:

A detailed explanation of SQL analysis and application

Principles of SQL parsing

1. Lexical analysis

SQL parsing consists of two parts: lexical analysis and grammatical / semantic analysis. Lexical analysis mainly converts the input into tokens. The Token contains Keyword (also called symbol) and non-Keyword. For example: SQL statement select username from userinfo, after analysis, you will get 4 Tokens, including 2 Keywords, select and from:

A detailed explanation of SQL analysis and application

Normally, lexical analysis can be generated using Flex.

But MySQL did not use the tool, but handwritten the lexical analysis part. The specific code is in the sql / lex.h and sql / sql_lex.cc files.

The keyword in MySQL is defined in sql / lex.h, as follows:

A detailed explanation of SQL analysis and application

The core code of lexical analysis is MySQLLex → lex_one_Token in the sql / sql_lex.c file. Interested students can download the source code for research.

2. Grammar analysis

Grammar analysis is the process of generating a syntax tree. This is the most elaborate and complicated part of the entire analysis process, but this part of MySQL uses Bison to complete. Even so, how to design appropriate data structures and related algorithms to store and traverse all the information is also worth studying here.

Grammar tree

SQL statement:

select username, ismale from userinfo where age》 20 and level》 5 and 1 = 1

The following syntax tree will be generated:

A detailed explanation of SQL analysis and application

Syntax tree

For those students who have not been exposed to the implementation of compilers, they will definitely be curious about how to generate such a syntax tree, but the principles behind them are all in the category of compilers. . I also read some content in the process of learning MySQL source code.

Because the compiler involves too much content, my experience and time are limited, so I don't do too much exploration. From an engineering perspective, learning how to use Bison to build syntax trees to solve practical problems may be more helpful to our work. Let me discuss the process based on Bison.

MySQL parse tree generation process

The entire source code is in sql / sql_yacc.yy, and there are about 17K lines of code in MySQL5.6. The SQL involved is listed here:

select username, ismale from userinfo where age》 20 and level》 5 and 1 = 1

Part of the code of the parsing process is extracted. In fact, with Bison, the difficulty of SQL parsing is not as great as imagined. Especially after giving the context of analysis.

The code shows:

A detailed explanation of SQL analysis and application

A detailed explanation of SQL analysis and application

A detailed explanation of SQL analysis and application

Pull up and down to view

When you browse the above code, you will find that C ++ code is embedded in Bison. Store the parsed information in related objects through C ++ code. For example, the table information will be stored in TABLE_LIST, order_list stores the information in the order by clause, where clause is stored in Item. With this information, SQL can be further processed with the help of corresponding algorithms.

Core data structure and its relationship

In SQL parsing, the core structure is SELECT_LEX, which is defined in sql / sql_lex.h. Only the parts related to the above examples are listed below.

A detailed explanation of SQL analysis and application

SQL parse tree structure

In the above illustration, column names username and ismal are stored in item_list, table names are stored in table_list, and conditions are stored in where. Among them, the Item hierarchy in the where condition is the deepest, and the expression is more complicated, as shown in the following figure:

A detailed explanation of SQL analysis and application

where condition

Application of SQL parsing

In order to gain a deeper understanding of the SQL parser, here are two examples of applying SQL parsing:

1. Useless condition removal

"Useless condition removal" belongs to the logic optimization category of the optimizer, which can be completed only based on the SQL itself and the table structure. There are many optimization situations. The code is in the remove_eq_conds function in the sql / sql_opTImizer.cc file. In order to avoid excessively tedious descriptions and pasting large sections of code, here are the following four situations analyzed by pictures:

1 = 1 and (m》 3 and n》 4)

1 = 2 and (m》 3 and n》 4)

1 = 1 or (m》 3 and n》 4)

1 = 2 or (m》 3 and n》 4)

Useless conditions remove a:

A detailed explanation of SQL analysis and application

Useless condition removal b

A detailed explanation of SQL analysis and application

Useless conditions to remove c

A detailed explanation of SQL analysis and application

Useless conditions remove d

A detailed explanation of SQL analysis and application

If you are interested in its code implementation, you need to understand an important data structure Item class in MySQL. Because of its complexity, the MySQL official documentation specifically introduces the Item class.

Reference link: https://dev.mysql.com/doc/internals/en/item-class.html

Ali's MySQL team has a similar article. For a more detailed understanding, you need to check the sql / item_ * files in the source code.

2. SQL feature generation

In order to ensure the stable and efficient operation of the basic component of the database system, the industry has many auxiliary systems. Such as slow query system, middleware system. After these systems collect and receive SQL, they need to classify the SQL in order to statistical information or apply relevant strategies. When categorizing, it is usually necessary to obtain SQL features. For example SQL:

select username, ismale from userinfo where age》 20 and level》 5;

The SQL characteristics are:

select username, ismale from userinfo where age》? and level》?

The industry's well-known slow query analysis tool pt-query-digest implements this function through regular expressions, but this type of processing method has more bugs. Next, we will introduce how to use SQL analysis to complete the generation of SQL features.

SQL feature generation consists of two parts:

Generate Token array;

According to the Token array, generate SQL features.

First of all, in the chapter of lexical analysis, we introduced the keywords in SQL, and each keyword has a 16-bit integer corresponding to it, rather than the keywords are uniformly represented by ident, which also corresponds to a 16-bit integer. The following table:

A detailed explanation of SQL analysis and application

The process of converting a SQL into a feature:

A detailed explanation of SQL analysis and application

In the process of SQL parsing, you can easily complete the generation of Token array. Once the generation of the Token array is completed, the SQL feature can be easily generated. SQL features are widely used in various systems. For example, pt-query-digest needs to classify SQL according to features, but its implementation based on regular expressions has many bugs. Here are a few known bugs:

A detailed explanation of SQL analysis and application

Learning suggestions

Recently, in the process of exploring the SQL parser and optimizer, from the beginning at a loss, there are rules to follow, and also summarized some of the experience, here to share with you:

First of all, reading related books, books can give us a systematic perspective of parser and optimizer. However, there are few books on the market for MySQL. For current Chinese works, please see "Art of Database Query Optimizer: Principle Analysis and SQL Performance Optimization"

Secondly, you should read the source code, but it is best to use a certain version as the basis, such as MySQL5.6.23, because the code of the SQL parsing and optimization part is constantly changing, especially when it spans large versions, the changes are large;

Again, use GDB to debug more, verify your guesses, and check the reading quality;

Finally, you need to write relevant code to verify, and only when it is written can it be truly mastered.

2.0mm Male Header

2.0mm (0.079") Pitch Pin Headers
2.0mm pin headers are board-to-board or PCB to PCB Connectors rated for 250VAC and an industry-leading current of 3.0A. Antenk offers numerous configurations for this pin header. Designed for low-profile applications, this pin header is made from high-temperature thermoplastic and is offered with several means of connections and mounting styles such as through-hole (THM) or surface mount (SMT) and can be in vertical (straight), elevated or at a right angle configuration/orientation

Pin header customization is also available upon your request. The 2.0mm pitch pin header is highly recommendable for signal and low power PC board connections when space is at a premium and when 1.0mm and 1.27mm pitch headers cannot dissipate the required current. In addition, the 2.0mm pitch pin header holds an excellent mating quality that fits with various types of female connectors.


Applications of 2.0mm Pitch Pin Headers
Automotive, Heavy Duty Military and Marine
2.0mm pitch pin headers are for not only suitable for densely packed equipment requiring weight reduction and downsizing but also for automotive connections, built to be robust in tough and harsh conditions.
Battery Connections
Rechargeable battery packs, battery balancers, battery eliminator circuits. Battery connections rely on the ability of the current to pass reliable and solid current. This prevents overheating in the circuit and voltage drop.
Medical Diagnostic and Monitoring equipment
Communications: Telecoms and Datacoms

Industrial and Automotive Control and Test


Mount Type: Through-hole vs Surface Mount

2.0mm pitch pin (male) headers are offered in either Surface-mount or Through-hole mount termination. At one side of this pin header is a series of pins which can either be mounted and soldered directly onto the surface of the PCB (SMT) or placed into drilled holes on the PCB (THM).


Through-Hole (Poke-In)
Best used for high-reliability products that require stronger connections between layers.
Aerospace and military products are most likely to require this type of mounting as these products experience extreme accelerations, collisions, or high temperatures.
Useful in test and prototyping applications that sometimes require manual adjustments and replacements.
2.0mm vertical single row header, 2.0mm vertical dual row header, 2.0mm Elevated single row pin header, 2.0mm Elevated dual row pin Header, 2.0mm Right-angle single row header and 2.0mm Right-angle dual row header are some examples of Antenk products with through-hole mount type.

Surface-Mount
The most common electronic hardware requirements are SMT.
Essential in PCB design and manufacturing, having improved the quality and performance of PCBs overall.
Cost of processing and handling is reduced.
SMT components can be mounted on both side of the board.
Ability to fit a high number of small components on a PCB has allowed for much denser, higher performing, and smaller PCBs.

2.0mm Right-angle Dual Row pin header, 2.0mm SMT Single row pin header, 2.0mm SMT Dual row pin header and 2.0mm Elevated Dual Row Pin Header are Antenk`s SMT pin headers.


Soldering Temperature for 2.0mm Pitch Pin Headers
Soldering SMT pin connectors can be done at a maximum peak temperature of 260°C for maximum 60 seconds.


Pin-Type: Vertical (Straight) and Right-Angle
2.0mm pitch headers may be further classified into pin orientation as well, such as vertical or straight male header or right-angle male header.

Vertical or Straight Pin (Male) Header Orientation
One side of the series of pins is connected to PCB board in which the pins can be at a right-angle to the PCB surface (usually called "straight" or [vertical") or.


Right-Angle Pin (Male) Header Orientation

Parallel to the board's surface (referred to as "right-angle" pins).
Each of these pin-types have different applications that fit with their specific configuration.


PCB Connector Stacking
Elevated Pin Header Orientation
Elevated pins aka Stacked Pins or Mezzanine are simply stacked pin headers providing an exact distance requirement between PCBs that optimizes electrical reliability and performance between PCB boards.
Profile Above PCB
This type of configuration is the most common way of connecting board-to-board by a connector. First, the stacking height is calculated from one board to another and measured from the printed circuit board face to its highest insulator point above the PCB.

Single, Dual, Triple and Four Row Number of Rows
For a 2.0mm straight or vertical male pin header, the standard number of rows that Antenk offers ranges from 1 to 4 rows. However, customization can be available if n number of rows is needed by the customer. Also, the number of contacts for the single row is about 2-40 pins. For dual row, the number contacts may vary from 2-80 pins. For triple row, it`s 2-120 pins, while for four-row, it`s 2-160 pins.

Pin Material
The pins of the connector have been designed with copper alloy. With customer`s demand the pins can be made gold plated.

Breakaway design
The pin headers are also equipped with a breakaway design making them fully compatible with their female receptacles.

Custom 2.0mm Pitch Pin Headers
Customizable 2.0 mm pitch pin headers are also available, making your manufacturing process way faster as the pins are already inserted in the headers, insulator height is made at the right size and the accurate pin length you require is followed.
Parts are made using semi-automated manufacturing processes that ensure both precision and delicacy in handling the headers before packaging on tape and reel.

Tape and Reel Packaging for SMT Components
Antenk's SMT headers are offered with customizable mating pin lengths, in which each series has multiple number of of circuits, summing up to a thousand individual part number combinations per connector series.
The tape and reel carrier strip ensures that the headers are packaged within accurately sized cavities for its height, width and depth, securing the headers from the environment and maintaining consistent position during transportation.
Antenk also offer a range of custom Tape and reel carrier strip packaging cavities.

Pcb Pin Header,2.0Mm Male Header,2.0Mm Male Header Pins,2.0Mm Pin Header,0.079in Male Header, 0.079in Pin Header Connector

ShenZhen Antenk Electronics Co,Ltd , https://www.antenkelec.com

Posted on