Sami Fayoumi

Syntax Highlighting SQL in strings

2022-11-10

I write most of my backend code in Go, with PostgreSQL as my primary database. Queries are written in Go strings, and have no SQL syntax highlighting. I never considered trying to apply syntax highlighting to SQL strings, but Neovim and Tree-sitter made this too easy to ignore.

The Solution

While configuring Neovim, I learned about Tree-sitter and its syntax highlighting and querying capabilities. Thanks to the generosity of Neovim core maintainer TJ Devries and other content creators, I now have a simple method to add highlighting to code embedded in strings. Here's what the process looks like, with a provided example.

Adding Highlighting

Tree-Sitter Plugins

Install the tree-sitter and tree-sitter playground plugins.

Find code to query against

You'll want to find a file that represents your use case. I'll use the example Go code below, copied from an example snippet in the database/sql package:

screenshot of go code with a sql string without syntax highlighting

As you can see, no syntax highlighting is applied to the sql string.

Open Tree-sitter playground

Open the tree-siter playground with the command TSPlaygroundToggle. This will open a buffer with the parsed tree-sitter tree. The tree-sitter representation of the example snippet is shown below.

short_var_declaration [60, 2] - [72, 15]
left: expression_list [60, 2] - [60, 5]
identifier [60, 2] - [60, 5]
right: expression_list [60, 9] - [72, 15]
call_expression [60, 9] - [72, 15]
function: selector_expression [60, 9] - [72, 8]
operand: call_expression [60, 9] - [72, 3]
function: selector_expression [60, 9] - [60, 27]
operand: identifier [60, 9] - [60, 11]
field: field_identifier [60, 12] - [60, 27]
arguments: argument_list [60, 27] - [72, 3]
identifier [60, 28] - [60, 31]
raw_string_literal [60, 33] - [69, 2]
call_expression [70, 3] - [70, 22]
function: selector_expression [70, 3] - [70, 12]
operand: identifier [70, 3] - [70, 6]
field: field_identifier [70, 7] - [70, 12]
arguments: argument_list [70, 12] - [70, 22]
interpreted_string_literal [70, 13] - [70, 17]
identifier [70, 19] - [70, 21]
call_expression [71, 3] - [71, 24]
function: selector_expression [71, 3] - [71, 12]
operand: identifier [71, 3] - [71, 6]
field: field_identifier [71, 7] - [71, 12]
arguments: argument_list [71, 12] - [71, 24]
interpreted_string_literal [71, 13] - [71, 18]
identifier [71, 20] - [71, 23]

Reading the syntax tree below, we can recognize elements of our example snippet. The root of this segment is the short_var_declaration, with the left and right sides of the declaration represented as node fields. Moving your cursor over elements will highlight them in your code as you read through the syntax tree. Reading through the tree will be important as you write your query.

Querying the right nodes

Open the query editor by pressing o while focused on the playground, and begin. Tree-sitter query syntax documentation is available here.

In our example case, I start by selecting the argument list of a call expression. I'm also limiting the query to nodes with an identifier for the function adjacent to the argument list.

(call_expression
function: (selector_expression
field: (field_identifier)
)
arguments: (argument_list) @capture
)

Hovering over the @capture in the playground highlights the matching node (argument_list) in the original buffer.

function argument list highlighted by tree-sitter query capture

You'll also notice that all other argument_list nodes matching the query are also highlighted. To fix this, we'll need to narrow down the query to the argument_list nodes of function calls matching a specific function name.

Query refinement

Tree-sitter has multiple predicates that can be applied to capture names or strings. I'll use the match predicate to match the function name whose argument list I'm interested in capturing. I also capture the value of the field_identifier in @funcName for use in the match predicate. To limit the capture to just the query string in the argument list, I add the raw_string_literal node.

(call_expression
function: (selector_expression
field: (field_identifier) @funcName (#match? @funcName "^QueryRowContext$")
)
arguments: (argument_list
(raw_string_literal) @capture
)
)

This query now matches just the query string in the argument list of calls to the QueryRowContext function.

One last refinement

Working on the file, I notice another query string in the argument list of another function call.

another sql string snippet

The arguments of this call are parsed by tree-sitter into:

arguments: argument_list [92, 30] - [92, 92]
identifier [92, 31] - [92, 34]
interpreted_string_literal [92, 36] - [92, 91]

To match that string as well, I need to add the function name to the match regex and use an alternation to match either a raw_string_literal or an interpreted_string_literal.

(call_expression
function: (selector_expression
field: (field_identifier) @funcName (#match? @funcName "^QueryRowContext|QueryContext$")
)
arguments: (argument_list
[
(raw_string_literal)
(interpreted_string_literal)
] @capture
)
)

Applying Syntax Highlighting

Now that we've matched the sql strings in this file, we can apply SQL syntax highlighting. To make the necessary changes, we'll use the TSEditQueryUserAfter injections go command in Neovim to create a injections file in our after directory in the Neovim config. We can copy our query to this file and add language injection. To add injection we replace the @capture with @sql to apply SQL syntax highlighting. Since the captured text also includes the string delimiters, let's use Neovim's provided tree-sitter predicate offset to remove the quotes.

(call_expression
function: (selector_expression
field: (field_identifier) @funcName (#match? @funcName "^QueryRowContext|QueryContext$")
)
arguments: (argument_list
[
(raw_string_literal)
(interpreted_string_literal)
] @sql
)
(#offset! @sql 0 1 0 -1)
)

Save the injection file and reload the editor to see the changes. The result, after uppercasing keywords, shows a more readable sql string.

sql in string with full syntax highlighting

And that's the end of this post, but this is just the beginning of my Neovim+Tree-sitter journey:

Some ideas on my to-do list that are inspired by this post:

  • Embedded lua strings highlighting, lua autocomplete in go files for redis scripting
  • Go html templates with syntax highlighting
  • Sequel blog post for adding language autocomplete, db completion from DB schema etc.