Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
370 views
in Technique[技术] by (71.8m points)

sql server - Set based plan runs slower than scalar valued function with many conditions

This question is a bit more of a hypothetical than an actual code problem. But I have provided a dumbed down version of the code to illustrate the question. Please no comments about the silliness of the code itself. The actual code is too complicated (and proprietary) so this is the best way to proceed.

I have a scalar valued function as follows.

CREATE FUNCTION [dbo].[Compute_value]
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS FLOAT
AS
BEGIN
    IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 
    RETURN 0

    IF @bravo IS NULL OR @bravo <= 0
        RETURN 100

    IF (@charle + @delta) / @bravo <= 0
        RETURN 100
    DECLARE @x = DATEDIFF(GETDATE(),'1/1/2000')     
    RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
END

I have heard that table valued functions typically run much faster than scalar valued functions because they aren't RBAR. So I converted the logic to use a #temp_table construct just to benchmark it. Instead of about a dozen IF statements, I would have an equal number of UPDATEs to the #temp_table, and it ran twice as SLOWLY as the scalar UDF.

I thought perhaps this was happening because the UDF could return quickly on the first few conditions, thus causing much of the scalar UDF to be no-ops, but this is not the case. Examining the query execution plans for the #temp_table solution seem to indicate that the updates are causing most of the plan cost.

What might I be missing here? If I convert this to a table value function, am I stuck doing updates to the entire table variable for each conditional statement? Is there a way to avoid this, which seems to be slowing things down considerably? Am I missing something obvious here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The keyword term here is INLINE TABLE VALUED FUNCTIONS. You have two types of T-SQL tabled valued functions: multi-statement and inline. If your T-SQL function starts with a BEGIN statement then it's going to be crap - scalar or otherwise. You can't get a temp table into an inline table valued function so I'm assuming you went from scalar to mutli-statement table valued function which will probably be worse.

Your inline table valued function (iTVF) should look something like this:

CREATE FUNCTION [dbo].[Compute_value]
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue = 
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(<unit of measurement>,GETDATE(),'1/1/2000')/365)))
  END
GO;

Note that, in the code you posted, your DATEDIFF statement is missing the datepart parameter. If should look something like:

@x int = DATEDIFF(DAY, GETDATE(),'1/1/2000')   

Going a little further - it's important to understand why iTVF's are better than T-SQL scalar valued user-defined functions. It's not because table valued functions are faster than scalar valued functions, it's because Microsoft's implementation of T-SQL inline functions are faster than their implementation of T-SQL functions that are not inline. Note the following three functions that do the same thing:

-- Scalar version
CREATE FUNCTION dbo.Compute_value_scalar
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS FLOAT
AS
BEGIN
    IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 
    RETURN 0

    IF @bravo IS NULL OR @bravo <= 0
        RETURN 100

    IF (@charle + @delta) / @bravo <= 0
        RETURN 100
    DECLARE @x int = DATEDIFF(dd, GETDATE(),'1/1/2000')     
    RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
END
GO

-- multi-statement table valued function 
CREATE FUNCTION dbo.Compute_value_mtvf
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS  @sometable TABLE (newValue float) AS 
    BEGIN
    INSERT @sometable VALUES
(
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
  END
)
RETURN;
END
GO

-- INLINE table valued function
CREATE FUNCTION dbo.Compute_value_itvf
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue = 
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
  END
GO

Now for some sample data and performance test:

SET NOCOUNT ON;
CREATE TABLE #someTable (alpha FLOAT, bravo FLOAT, charle FLOAT, delta FLOAT);
INSERT #someTable
SELECT TOP (100000)
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a, sys.all_columns b;

PRINT char(10)+char(13)+'scalar'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'mtvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'itvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

Results:

scalar
------------------------------------------------------------
2786

mTVF
------------------------------------------------------------
41536

iTVF
------------------------------------------------------------
153

The scalar udf ran for 2.7 seconds, 41 seconds for the mtvf and 0.153 seconds for the iTVF. To understand why let's look at the estimated execution plans:

enter image description here

You don't see this when you look at the actual execution plan but, with the scalar udf and mtvf, the optimizer calls some poorly executed subroutine for each row; the iTVF does not. Quoting Paul White's career changing article about APPLY Paul writes:

You might find it useful to think of an iTVF as a view that accepts parameters. Just as for views, SQL Server expands the definition of an iTVF directly into the query plan of an enclosing query, before optimization is performed.

The effect is that SQL Server is able to apply its full range of optimizations, considering the query as a whole. It is just as if you had written the expanded query out by hand....

In other words, iTVF's enable to optimizer to optimize the query in ways that just aren't possible when all that other code needs to be executed. One of many other examples of why iTVFs are superior is they are the only one of the three aforementioned function types that allow parallelism. Let's run each function one more time, this time with the Actual Execution plan turned on and with traceflag 8649 (which forces a parallel execution plan):

-- don't need so many rows for this test
TRUNCATE TABLE #sometable;
INSERT #someTable 
SELECT TOP (10)
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a;

DECLARE @x float;

SELECT TOP (10) @x = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t
ORDER BY dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
OPTION (QUERYTRACEON 8649);

SELECT TOP (10)  @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

SELECT @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

Execution plans:

enter image description here

Those arrows you see for the iTVF's execution plan is parallelism - all your CPU's (or as many as your SQL instance's MAXDOP settings allow) working together. T-SQL scalar and mtvf UDFs can't do that. When Microsoft introduces inline scalar UDFs then I'd suggest those for what you're doing but, until then: if performance is what you're looking for then inline is the only way to go and, for that, iTVFs are the only game in town.

Note that I continuously emphasized T-SQL when talking about functions... CLR Scalar and Table valued functions can be just fine but that's a different topic.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...