Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
292 views
in Technique[技术] by (71.8m points)

sql - Using MySQL sort varchar column numerically with cast as unsigned when the column can begin or end with letters

I've ran into something I'm not really sure how to handle here. I'm building a database to store information on sports cards, and I'm having a bit of an issue with some sorting when I want to see certain cards.

For background, each card (row in the database) has information on year, set the card is from, player on the card, and card number (there's more info than that, but this is all that's relevant here). When I see results, I want things to be sorted by year, then set, then player, then card number. Everything but card number is working fine, as year is just an integer, and set and player are both varchars, so it's easy to sort those. However, the card number is what I'm running into some issues with.

The card number column is a varchar since the card number can include letters, numbers, and dashes. Most commonly, a card number will be a straight number (i.e. 1, 2, 3, 4, 5), straight letters (Ex-A, Ex-B, Ex-C), a number followed by a letter (1a, 1b, 2, 3a, 3b, 3c), or a letter followed by a number (A1, A2, A3, A4, A5). This is how I currently have the sort portion of my SQL string set up:

order by year desc, cardset asc, subset asc, cast(cardNum as unsigned) asc;

This is handling MOST things fine. But what I'm having issues with is when a group of cards have the same letters in their card number, and then have a number. I want the sort to essentially ignore the leading letters and then just sort by the numbers. But, sometimes it doesn't do this correctly, particularly, when there's more than 5ish cards with this to sort.

Specifically, it's incorrectly sorting some cards with the following card numbers into the following order:

  • BCP61
  • BCP97
  • BCP32
  • BCP135

When it should result in:

  • BCP32
  • BCP61
  • BCP97
  • BCP135

It is currently sorting straight numbers, or numbers followed by letters correctly (i.e. 1, 2, 3, 4, 5, or 1, 2, 3a, 3b, 4, 5a, 5b). I'm not aware of any issues with it sorting straight letters incorrectly, but I also currently have very few test cases of this, so I'm not sure if it's 100% or not.

In addition to not knowing how to modify my SQL sort statement without messing up other sorts, I don't really know how it's coming up with the order it is for the BCP example above. Any thoughts on how to correct it? I thought about trying to ignore letters in card number until we get to numbers, but that would cause major issues for cards with only letters in the card number. So I'm a bit stuck.

Absolute worst comes to worst, I could probably split the card number column into 2 different columns, one for the part that is more descriptive, and one for the part I want to sort by. That would probably end up working just fine, but would require a lot of work to get things back!

Edit- Here is some more information including data in my DB (sorry for the formatting, no idea how to do tables here):

| year | cardSet | subset                     | cardNum |
| 2016 | Bowman  |                            | 52      |
| 2016 | Bowman  |                            | 54      |
| 2016 | Bowman  |                            | 147     |
| 2016 | Bowman  | Chrome Prospects           | BCP32   |
| 2016 | Bowman  | Chrome Prospects           | BCP61   |
| 2016 | Bowman  | Chrome Prospects           | BCP97   |
| 2016 | Bowman  | Chrome Prospects           | BCP135  |
| 2016 | Topps   |                            | 1       |
| 2016 | Topps   |                            | 2a      |
| 2016 | Topps   |                            | 2b      |
| 2016 | Topps   |                            | 3       |

I would expect my sort to spit out results in the following order:

  • 2016 Bowman 52
  • 2016 Bowman 54
  • 2016 Bowman 147
  • 2016 Bowman Chrome Prospects BCP32
  • 2016 Bowman Chrome Prospects BCP61
  • 2016 Bowman Chrome Prospects BCP97
  • 2016 Bowman Chrome Prospects BCP125
  • 2016 Topps 1
  • 2016 Topps 2a
  • 2016 Topps 2b
  • 2016 Topps 3

However, here is the results I am given with my above sorting statement:

  • 2016 Bowman 52
  • 2016 Bowman 54
  • 2016 Bowman 147
  • 2016 Bowman Chrome Prospects BCP62
  • 2016 Bowman Chrome Prospects BCP97
  • 2016 Bowman Chrome Prospects BCP32
  • 2016 Bowman Chrome Prospects BCP125
  • 2016 Topps 1
  • 2016 Topps 2a
  • 2016 Topps 2b
  • 2016 Topps 3

It handles card numbers with just numbers, or numbers followed by letters just fine, but it tends to mess things up when the card number begins with letters and is followed by numbers.

I have tried using the length() trick in the comments so that the sort part of my SQL is:

order by year desc, cardset asc, subset asc, length(cardNum), cardNum asc

That does fix the issue I was describing above, but messes up the 'Topps' part in my example- It'll put cards with letters following a number last no matter what. Here's the order I get with that sort:

  • 2016 Bowman 52
  • 2016 Bowman 54
  • 2016 Bowman 147
  • 2016 Bowman Chrome Prospects BCP32
  • 2016 Bowman Chrome Prospects BCP61
  • 2016 Bowman Chrome Prospects BCP97
  • 2016 Bowman Chrome Prospects BCP125
  • 2016 Topps 1
  • 2016 Topps 3
  • 2016 Topps 2a
  • 2016 Topps 2b
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

MariaDB 10 and MySQL 8 support REGEXP_REPLACE. Using that, you can define a custom function:

DROP FUNCTION IF EXISTS zerofill_numbers;
CREATE FUNCTION zerofill_numbers(str TEXT, len TINYINT)
    RETURNS text
    NO SQL
    DETERMINISTIC
    RETURN REGEXP_REPLACE(
        REGEXP_REPLACE(str, '(\d+)', LPAD('\1', len+2, 0)),
        REPLACE('0*(\d{$len})', '$len', len),
        '\1'
    );

Now given the following test data:

DROP TABLE IF EXISTS `strings`;
CREATE TABLE IF NOT EXISTS `strings` (`str` text);
INSERT INTO `strings` (`str`) VALUES
    ('Bowman 52'),
    ('Bowman 54'),
    ('Bowman 147'),
    ('Bowman Chrome Prospects BCP32'),
    ('Bowman Chrome Prospects BCP61'),
    ('Bowman Chrome Prospects BCP97'),
    ('Bowman Chrome Prospects BCP125'),
    ('Topps 1'),
    ('Topps 3'),
    ('Topps 2a'),
    ('Topps 2b'),
    ('v9.9.3'),
    ('v9.10.3'),
    ('v11.3.4'),
    ('v9.9.11'),
    ('v11.3'),
    ('0.9'),
    ('0.11'),
    ('s09'),
    ('s11'),
    ('s0'),
    ('v9.0.1');

We can sort it with:

SELECT s.str
FROM strings s
ORDER BY zerofill_numbers(s.str, 10)

Here is the result:

0.9
0.11
Bowman 52
Bowman 54
Bowman 147
Bowman Chrome Prospects BCP32
Bowman Chrome Prospects BCP61
Bowman Chrome Prospects BCP97
Bowman Chrome Prospects BCP125
s0
s09
s11
Topps 1
Topps 2a
Topps 2b
Topps 3
v9.0.1
v9.9.3
v9.9.11
v9.10.3
v11.3
v11.3.4

The function will fill any number in in the string with zeros until it has the defined length.

Note 1: This will not sort decimal numbers correctly (see 0.9 and 0.11). You should also not try to use it for signed numbers.

Note 2: This function is based on the following answer: https://stackoverflow.com/a/46099386/5563083 - So if you find this answer helpfull, go and upvote the source.

Note 3: If you don't want to define a custom function, you can use the same method inline:

SELECT *
FROM strings
ORDER BY
  REGEXP_REPLACE(REGEXP_REPLACE(str, '(\d+)', LPAD('\1', 10+2, 0)), '0*(\d{10})', '\1')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...