Need to catapult your LLM into the highest 10 of Spider, a extensively used benchmark for text-to-SQL duties? Spider challenges LLMs to craft SQL responses from textual content queries utilizing desk schemas and extra knowledge. Mastering text-to-SQL can revolutionize enterprise intelligence and enterprise options. On this put up, we’ll dive into how we simply reached 79.9% and 78.9% on the Spider dev and check datasets respectively with Llama3 8B – a 19 level enhance over baseline – attaining a top-10 spot on the (now frozen) Spider leaderboard by way of savvy prompting and fine-tuning on Databricks! Discover ways to elevate your LLM with precision.
Zero-shot Prompting for Baseline Efficiency
Let’s begin by evaluating the efficiency of Meta Llama 3 8B Instruct on the Spider dev dataset utilizing a quite simple immediate format consisting of the CREATE TABLE
statements that created the tables and a query we might prefer to reply utilizing these tables:
{create_table_queries}
-- {query}
SELECT
The sort of immediate is sometimes called “zero-shot” as a result of there are not any different examples within the immediate. For the primary query within the Spider dev dataset this immediate format produces:
CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Title textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
…<omitted the singer, live performance, and singer_in_concert tables for brevity>
-- What number of singers do we have now?
SELECT
Working the Spider benchmark on the dev dataset utilizing this format produces an total rating of 60.9 when measured utilizing execution accuracy and grasping decoding. Which means that 60.9% of the time the mannequin produces SQL that when executed produces the identical outcomes as a “gold” question representing the proper answer.
Simple | Medium | Exhausting | Further | All | |
---|---|---|---|---|---|
Zero-shot | 78.6 | 69.3 | 42.5 | 31.3 | 60.9 |
With the baseline rating established, earlier than we even get into fine-tuning let’s attempt completely different prompting methods to attempt to elevate the rating for the bottom mannequin on the Spider dev benchmark dataset.
Prompting With Pattern Rows
One of many drawbacks with the primary immediate we used is that it does not embody any details about the info within the columns past the info kind. A paper on evaluating text-to-SQL capabilities of fashions with Spider discovered that including sampled rows to the immediate led to a better rating, so let’s attempt that.
We are able to replace the immediate format above in order that the create desk queries additionally embody the primary few rows from every desk. For a similar query from earlier we not have an up to date immediate:
CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Title textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID Location Title Capability Highest Lowest
Common
1 Raith Rovers Stark's Park 10104 4812 1294 2106
2 Ayr United Somerset Park 11998 2363 1057 1477
3 East Fife Bayview Stadium 2000 1980 533 864
*/
…<omitted the singer, live performance, and singer_in_concert tables for
brevity>
-- What number of singers do we have now?
SELECT
Together with pattern rows for every desk raises the general rating by about 6 proportion factors to 67.0:
Simple | Medium | Exhausting | Further | All | |
---|---|---|---|---|---|
Zero-shot with pattern rows | 80.6 | 75.3 | 51.1 | 41.0 | 67.0 |
Few-shot Prompting
Few-shot prompting is a well-known technique used with LLMs the place we are able to enhance the efficiency on a job corresponding to producing appropriate SQL by together with some examples demonstrating the duty to be carried out. With a zero-shot immediate we supplied the schemas after which requested a query. With a few-shot immediate we offer some schemas, a query, the SQL that solutions that query, after which repeat that sequence a pair occasions earlier than attending to the precise query we wish to ask. This typically leads to higher efficiency than a zero-shot immediate.
An excellent supply of examples demonstrating the SQL era job is definitely the Spider coaching dataset itself. We are able to take a random pattern of some questions from this dataset with their corresponding tables and assemble a few-shot immediate demonstrating the SQL that may reply every of those questions. Since we are actually utilizing pattern rows as of the earlier immediate we must also guarantee considered one of these examples additionally consists of pattern rows as properly to exhibit their utilization.
One other enchancment we are able to make on the earlier zero-shot immediate is to additionally embody a “system immediate” at the start. System prompts are usually used to supply detailed steering to the mannequin that define the duty to be carried out. Whereas a consumer might ask a number of questions all through the course of chat with a mannequin, the system immediate is simply supplied as soon as earlier than the consumer even asks a query, primarily establishing expectations for a way the “system” ought to carry out in the course of the chat.
With these methods in thoughts, we are able to assemble a few-shot immediate that additionally begins with a system message represented as a big SQL remark block on the high adopted by three examples:
/*
You're a useful assistant who solutions questions on database tables
by responding with SQL queries. Customers will give you a set of
tables represented as CREATE TABLE statements. Every CREATE TABLE
assertion might optionally be adopted by the primary few rows from the
desk with a purpose to assist write the proper SQL to reply questions. After
the CREATE TABLE statements customers will ask a query utilizing a SQL
remark beginning with two dashes. You need to reply the consumer's query
by writing a SQL assertion beginning with SELECT and ending with a
semicolon.
*/
CREATE TABLE "Campuses" (
"Id" INTEGER PRIMARY KEY,
"Campus" TEXT,
"Location" TEXT,
"County" TEXT,
"Yr" INTEGER
);
/*
Id Campus Location County Yr
1 California State College-Bakersfield Bakersfield Kern
1965
2 California State College-Channel Islands Camarillo
Ventura 2002
3 California State College-Chico Chico Butte 1887
*/
… <extra tables omitted>
-- Please reply the next query utilizing the tables above.
-- Discover the identify of the campuses that's in Northridge, Los Angeles or
-- in San Francisco, San Francisco.
SELECT Campus FROM Campuses WHERE Location="Northridge" AND County="Los
Angeles"
UNION SELECT Campus FROM Campuses WHERE Location="San Francisco" AND
County="San Francisco";
… <two extra examples omitted>
CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Title textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID Location Title Capability Highest Lowest
Common
1 Raith Rovers Stark's Park 10104 4812 1294 2106
2 Ayr United Somerset Park 11998 2363 1057 1477
3 East Fife Bayview Stadium 2000 1980 533 864
*/
…<omitted the singer, live performance, and singer_in_concert tables for
brevity>
-- What number of singers do we have now?
SELECT
This new immediate has resulted in a rating of 70.8, which is one other 3.8 proportion level enchancment over our earlier rating. We have now raised the rating practically 10 proportion factors from the place we began simply by way of easy prompting methods.
Simple | Medium | Exhausting | Further | All | |
---|---|---|---|---|---|
Few-shot with pattern rows | 83.9 | 79.1 | 55.7 | 44.6 | 70.8 |
We’re in all probability now reaching the purpose of diminishing returns from tweaking our immediate. Let’s fine-tune the mannequin to see what additional positive aspects could be made.
Nice-Tuning with LoRA
If we’re fine-tuning the mannequin the primary query is what coaching knowledge to make use of. Spider features a coaching dataset so this looks like a superb place to begin. To fine-tune the mannequin we are going to use QLoRA in order that we are able to effectively prepare the mannequin on a single A100 80GB Databricks GPU cluster corresponding to Standard_NC24ads_A100_v4 in Databricks. This may be accomplished in about 4 hours utilizing the 7k information within the Spider coaching dataset. We have now beforehand mentioned fine-tuning with LoRA in an earlier weblog put up. readers can consult with that put up for extra particulars. We are able to observe normal coaching recipes utilizing the trl, peft, and bitsandbytes libraries.
Though we’re getting the coaching information from Spider, we nonetheless have to format them in a approach that the mannequin can study from. The purpose is to map every file, consisting of the schema (with pattern rows), query and SQL right into a single textual content string. We begin by performing some processing on the uncooked Spider dataset. From the uncooked knowledge we produce a dataset the place every file consists of three fields: schema_with_rows, query
, and question
. The schema_with_rows
subject is derived from the tables equivalent to the query, following the formatting of the CREATE TABLE
assertion and rows used within the few-shot immediate earlier.
Subsequent load the tokenizer:
tokenizer =
AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
We’ll outline a mapping operate that may convert every file from our processed Spider coaching dataset right into a textual content string. We are able to use apply_chat_template
from the tokenizer to conveniently format the textual content into the chat format anticipated by the Instruct mannequin. Though this is not the very same format we’re utilizing for our few-shot immediate, the mannequin generalizes properly sufficient to work even when the boilerplate formatting of the prompts is barely completely different.
def _mapper(rec):
schema = rec["schema_with_rows"].strip()
query = rec["question"].strip()
question = rec["query"].strip()
user_message = USER_MESSAGE_FORMAT.format(schema=schema,
query=query)
messages = [
{
"role": "system",
"content": SYSTEM_PROMPT,
},
{"role": "user", "content": user_message},
{"role": "assistant", "content": query},
]
immediate = tokenizer.apply_chat_template(messages, tokenize=False,
add_generation_prompt=False)
return {"textual content": immediate}
For SYSTEM_PROMPT we use the identical system immediate used within the few-shot immediate earlier. For USER_MESSAGE_FORMAT we equally use:
{schema}
Please reply the next query utilizing the tables above.
{query}
With this operate outlined all that’s left is to remodel the processed Spider dataset with it and reserve it as a JSONL file.
dataset.map(_mapper)
We are actually prepared to coach. Just a few hours later we have now a fine-tuned Llama3 8B Instruct. Rerunning our few-shot immediate on this new mannequin resulted in a rating of 79.9, which is one other 9 proportion level enchancment over our earlier rating. We have now now raised the overall rating by ~19 proportion factors over our easy zero-shot baseline.
Simple | Medium | Exhausting | Further | All | |
---|---|---|---|---|---|
Few-shot with pattern rows
(Nice-tuned Llama3 8B Instruct) |
91.1 | 85.9 | 72.4 | 54.8 | 79.9 |
Few-shot with pattern rows
(Llama3 8B Instruct) |
83.9 | 79.1 | 55.7 | 44.6 | 70.8 |
Zero-shot with pattern rows
(Llama3 8B Instruct) |
80.6 | 75.3 | 51.1 | 41.0 | 67.0 |
Zero-shot
(Llama3 8B Instruct) |
78.6 | 69.3 | 42.5 | 31.3 | 60.9 |
You is likely to be questioning now how the Llama3 8B Instruct mannequin and the fine-tuned model evaluate towards a bigger mannequin corresponding to Llama3 70B Instruct. We have now repeated the analysis course of utilizing the off-the-shelf 70B mannequin on the dev dataset with eight A100 40 GB GPUs and recorded the outcomes beneath.
Few-shot with pattern rows
(Llama3 70B Instruct) |
89.5 | 83.0 | 64.9 | 53.0 | 76.7 |
Zero-shot with pattern rows
(Llama3 70B Instruct) |
83.1 | 81.8 | 59.2 | 36.7 | 71.1 |
Zero-shot
(Llama3 70B Instruct) |
82.3 | 80.5 | 57.5 | 31.9 | 69.2 |
As anticipated, evaluating the off-the-shelf fashions, the 70B mannequin beats the 8B mannequin when measured utilizing the identical immediate format. However what’s stunning is that the fine-tuned Llama3 8B Instruct mannequin scores larger than the Llama3 70B Instruct mannequin by 3 proportion factors. When targeted on particular duties corresponding to text-to-SQL, fine-tuning may end up in small fashions which might be comparable in efficiency with fashions which might be a lot bigger in measurement.
Deploy to a Mannequin Serving Endpoint
Llama3 is supported by Mosaic AI Mannequin Serving, so we might even deploy our fine-tuned Llama3 mannequin to an endpoint and use it to energy functions. All we have to do is log the fine-tuned mannequin to Unity Catalog after which create an endpoint utilizing the UI. As soon as it’s deployed we are able to question it utilizing widespread libraries.
Wrapping Up
We kicked off our journey with the Llama3 8B Instruct on the Spider dev dataset utilizing a zero-shot immediate, attaining a modest rating of 60.9. By enhancing this with a few-shot immediate—full with system messages, a number of examples, and pattern rows—we boosted our rating to 70.8. Additional positive aspects got here from fine-tuning the mannequin on the Spider coaching dataset, propelling us to a powerful 79.9 on Spider dev and 78.9 on Spider check. This vital 19-point climb from our place to begin and a 3-point lead over the bottom Llama3 70B Instruct not solely showcases our mannequin’s prowess but in addition would safe us a coveted spot within the top-10 outcomes on Spider.
Be taught extra about tips on how to leverage the ability of open supply LLMs and the Information Intelligence Platform by registering for Information+AI Summit.
Appendix
Analysis Setup
Technology was carried out utilizing vLLM, grasping decoding (temperature of 0), two A100 80 GB GPUs, and 1024 max new tokens. To guage the generations we used the check suite from the taoyds/test-suite-sql-eval repo in Github.
Coaching Setup
Right here is the particular particulars concerning the fine-tuning setup:
Base Mannequin | Llama3 8B Instruct |
GPUs | Single A100 80GB |
Max Steps | 100 |
Spider prepare dataset information | 7000 |
Lora R | 16 |
Lora Alpha | 32 |
Lora Dropout | 0.1 |
Studying Charge | 1.5e-4 |
Studying Charge Scheduler | Fixed |
Gradient Accumulation Steps | 8 |
Gradient Checkpointing | True |
Practice Batch Dimension | 12 |
LoRA Goal Modules | q_proj,v_proj,k_proj,o_proj,gate_proj,up_proj,down_proj |
Information Collator Response Template | <|start_header_id|>assistant<|end_header_id|> |
Zero-shot Immediate Instance
That is the primary file from the dev dataset we used for analysis formatted as a zero-shot immediate that features the desk schemas. The tables the query is regarding are represented utilizing the CREATE TABLE
statements that created them.
CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Title textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
CREATE TABLE singer (
Singer_ID int,
Title textual content,
Nation textual content,
Song_Name textual content,
Song_release_year textual content,
Age int,
Is_male bool,
PRIMARY KEY (Singer_ID)
)
CREATE TABLE live performance (
concert_ID int,
concert_Name textual content,
Theme textual content,
Stadium_ID textual content,
Yr textual content,
PRIMARY KEY (concert_ID),
FOREIGN KEY (Stadium_ID) REFERENCES stadium(Stadium_ID)
)
CREATE TABLE singer_in_concert (
concert_ID int,
Singer_ID textual content,
PRIMARY KEY (concert_ID,Singer_ID),
FOREIGN KEY (concert_ID) REFERENCES live performance(concert_ID),
FOREIGN KEY (Singer_ID) REFERENCES singer(Singer_ID)
)
-- What number of singers do we have now?
SELECT
Zero-shot with Pattern Rows Immediate Instance
That is the primary file from the dev dataset we used for analysis formatted as a zero-shot immediate that features the desk schemas and pattern rows. The tables the query is regarding are represented utilizing the CREATE TABLE statements that created them. The rows have been chosen utilizing “SELECT * {table_name} LIMIT 3” from every desk, with the column names showing as a header.
CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Title textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID Location Title Capability Highest Lowest Common
1 Raith Rovers Stark's Park 10104 4812 1294 2106
2 Ayr United Somerset Park 11998 2363 1057 1477
3 East Fife Bayview Stadium 2000 1980 533 864
*/
CREATE TABLE singer (
Singer_ID int,
Title textual content,
Nation textual content,
Song_Name textual content,
Song_release_year textual content,
Age int,
Is_male bool,
PRIMARY KEY (Singer_ID)
)
/*
Singer_ID Title Nation Song_Name Song_release_year Age Is_male
1 Joe Sharp Netherlands You 1992 52 F
2 Timbaland United States Harmful 2008 32 T
3 Justin Brown France Hey Oh 2013 29 T
*/
CREATE TABLE live performance (
concert_ID int,
concert_Name textual content,
Theme textual content,
Stadium_ID textual content,
Yr textual content,
PRIMARY KEY (concert_ID),
FOREIGN KEY (Stadium_ID) REFERENCES stadium(Stadium_ID)
)
/*
concert_ID concert_Name Theme Stadium_ID Yr
1 Auditions Free selection 1 2014
2 Tremendous bootcamp Free selection 2 2 2014
3 Dwelling Visits Bleeding Love 2 2015
*/
CREATE TABLE singer_in_concert (
concert_ID int,
Singer_ID textual content,
PRIMARY KEY (concert_ID,Singer_ID),
FOREIGN KEY (concert_ID) REFERENCES live performance(concert_ID),
FOREIGN KEY (Singer_ID) REFERENCES singer(Singer_ID)
)
/*
concert_ID Singer_ID
1 2
1 3
1 5
*/
-- What number of singers do we have now?
SELECT
Few-shot with Pattern Rows Immediate Instance
That is the primary file from the dev dataset we used for analysis formatted as a few-shot immediate that features the desk schemas and pattern rows. The tables the query is regarding are represented utilizing the CREATE TABLE statements that created them. The rows have been chosen utilizing “SELECT * {table_name} LIMIT 3” from every desk, with the column names showing as a header.
/*
You're a useful assistant who solutions questions on database tables by
responding with SQL
queries. Customers will give you a set of tables represented as CREATE
TABLE statements. Every CREATE TABLE assertion might optionally be adopted by
the primary few rows from the desk with a purpose to assist write the proper SQL to
reply questions. After the CREATE TABLE statements customers will ask a
query utilizing a SQL remark beginning with two dashes. You need to reply the
consumer's query by writing a SQL assertion beginning with SELECT and ending
with a semicolon.
*/
CREATE TABLE "Campuses" (
"Id" INTEGER PRIMARY KEY,
"Campus" TEXT,
"Location" TEXT,
"County" TEXT,
"Yr" INTEGER
);
/*
Id Campus Location County Yr
1 California State College-Bakersfield Bakersfield Kern 1965
2 California State College-Channel Islands Camarillo Ventura
2002
3 California State College-Chico Chico Butte 1887
*/
CREATE TABLE "csu_fees" (
"Campus" INTEGER PRIMARY KEY,
"Yr" INTEGER,
"CampusFee" INTEGER,
FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus Yr CampusFee
1 1996 1951
2 2003 1868
3 1996 2042
*/
CREATE TABLE "levels" (
"Yr" INTEGER,
"Campus" INTEGER,
"Levels" INTEGER,
PRIMARY KEY (Yr, Campus),
FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Yr Campus Levels
1990 1 701
1991 1 681
1992 1 791
*/
CREATE TABLE "discipline_enrollments" (
"Campus" INTEGER,
"Self-discipline" INTEGER,
"Yr" INTEGER,
"Undergraduate" INTEGER,
"Graduate" INTEGER,
PRIMARY KEY (Campus, Self-discipline),
FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus Self-discipline Yr Undergraduate Graduate
1 4 2004 248 0
1 5 2004 811 73
1 6 2004 199 0
*/
CREATE TABLE "enrollments" (
"Campus" INTEGER,
"Yr" INTEGER,
"TotalEnrollment_AY" INTEGER,
"FTE_AY" INTEGER,
PRIMARY KEY(Campus, Yr),
FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus Yr TotalEnrollment_AY FTE_AY
1 1956 384 123
1 1957 432 151
1 1958 422 178
*/
CREATE TABLE "college" (
"Campus" INTEGER,
"Yr" INTEGER,
"College" REAL,
FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus Yr College
1 2002 357.1
2 2002 48.4
3 2002 742.8
*/
-- Please reply the next query utilizing the tables above.
-- Discover the identify of the campuses that's in Northridge, Los Angeles or in
San Francisco, San Francisco.
SELECT Campus FROM Campuses WHERE Location="Northridge" AND County="Los
Angeles" UNION SELECT Campus
FROM Campuses WHERE Location="San Francisco" AND County="San Francisco";
CREATE TABLE Allergy_Type (
Allergy VARCHAR(20) PRIMARY KEY,
AllergyType VARCHAR(20)
);
CREATE TABLE Has_Allergy (
StuID INTEGER,
Allergy VARCHAR(20),
FOREIGN KEY(StuID) REFERENCES Scholar(StuID),
FOREIGN KEY(Allergy) REFERENCES Allergy_Type(Allergy)
);
CREATE TABLE Scholar (
StuID INTEGER PRIMARY KEY,
LName VARCHAR(12),
Fname VARCHAR(12),
Age INTEGER,
Intercourse VARCHAR(1),
Main INTEGER,
Advisor INTEGER,
city_code VARCHAR(3)
);
-- Please reply the next query utilizing the tables above.
-- Which allergy kind has most variety of allergic reactions?
SELECT AllergyType FROM Allergy_Type GROUP BY AllergyType ORDER BY rely(*)
DESC LIMIT 1;
CREATE TABLE "constructing" (
"building_id" textual content,
"Title" textual content,
"Street_address" textual content,
"Years_as_tallest" textual content,
"Height_feet" int,
"Flooring" int,
PRIMARY KEY("building_id")
);
CREATE TABLE "Establishment" (
"Institution_id" textual content,
"Establishment" textual content,
"Location" textual content,
"Based" actual,
"Sort" textual content,
"Enrollment" int,
"Staff" textual content,
"Primary_Conference" textual content,
"building_id" textual content,
PRIMARY KEY("Institution_id"),
FOREIGN KEY ("building_id") REFERENCES "constructing"("building_id")
);
CREATE TABLE "protein" (
"common_name" textual content,
"protein_name" textual content,
"divergence_from_human_lineage" actual,
"accession_number" textual content,
"sequence_length" actual,
"sequence_identity_to_human_protein" textual content,
"Institution_id" textual content,
PRIMARY KEY("common_name"),
FOREIGN KEY("Institution_id") REFERENCES "Establishment"("Institution_id")
);
-- Please reply the next query utilizing the tables above.
-- For every constructing, present the identify of the constructing and the variety of
establishments in it.
SELECT T1.identify, rely(*) FROM constructing AS T1 JOIN Establishment AS T2 ON
T1.building_id=
T2.building_id GROUP BY T1.building_id;
CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Title textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID Location Title Capability Highest Lowest Common
1 Raith Rovers Stark's Park 10104 4812 1294 2106
2 Ayr United Somerset Park 11998 2363 1057 1477
3 East Fife Bayview Stadium 2000 1980 533 864
*/
CREATE TABLE singer (
Singer_ID int,
Title textual content,
Nation textual content,
Song_Name textual content,
Song_release_year textual content,
Age int,
Is_male bool,
PRIMARY KEY (Singer_ID)
)
/*
Singer_ID Title Nation Song_Name Song_release_year Age
Is_male
1 Joe Sharp Netherlands You 1992 52 F
2 Timbaland United States Harmful 2008 32 T
3 Justin Brown France Hey Oh 2013 29 T
*/
CREATE TABLE live performance (
concert_ID int,
concert_Name textual content,
Theme textual content,
Stadium_ID textual content,
Yr textual content,
PRIMARY KEY (concert_ID),
FOREIGN KEY (Stadium_ID) REFERENCES stadium(Stadium_ID)
)
/*
concert_ID concert_Name Theme Stadium_ID Yr
1 Auditions Free selection 1 2014
2 Tremendous bootcamp Free selection 2 2 2014
3 Dwelling Visits Bleeding Love 2 2015
*/
CREATE TABLE singer_in_concert (
concert_ID int,
Singer_ID textual content,
PRIMARY KEY (concert_ID,Singer_ID),
FOREIGN KEY (concert_ID) REFERENCES live performance(concert_ID),
FOREIGN KEY (Singer_ID) REFERENCES singer(Singer_ID)
)
/*
concert_ID Singer_ID
1 2
1 3
1 5
*/
-- What number of singers do we have now?
SELECT