SQL基础操作_5_字符串处理
目录
7.6 处理字符串
7.6.1 生成自增值
7.6.2 遍历字符串里的每个值
7.6.3 处理含引号的字符串
7.6.4 计算某个字符出现的次数
7.6.5 字符串里过滤不需要的字符
7.6.6 拆分字符串里的字符和数字
7.6.7 判断字符串是字符串数字型
7.6.8 判断字符串含有汉字
7.6.9 合并多行到一行
7.6.10 对字符串重新按字母排序重新组合
7.6.11 判断一个字符是否是数字
7.6.12 按照指定的位置截取字符
7.6.13 按照指定的分隔符截取字符返回表形式
7.6 处理字符串
注:数据集和表结构见 SQL基础操作_1_检索数据
7.6.1 生成自增值
需求:通过SQL生成一个1到1000条记录.
解决方法:通过CTE的递归来实现该需求.
SQLServer:
DECLARE @startINT, @endINT SELECT @start=1, @end=1000 ;WITH NumberSequence( Number) AS (SELECT @start ASNumberUNION ALLSELECT Number + 1FROM NumberSequenceWHERE Number <@end ) SELECT * FROM NumberSequence OPTION (MaxRecursion 1000)执行结果:
| Number |
| 1 |
| 2 |
| 3 |
| ... |
|
|
Oracle:
WITH t(num) AS (SELECT 1 FROM DUALUNION ALLSELECT t.num+1FROM t WHERE t.num<100) SELECT * FROM t;执行结果:
| Num |
| 1 |
| 2 |
| 3 |
| ... |
|
|
Mysql(8.0及以上版本):
WITH RECURSIVE cte (num) AS (SELECT 1UNION ALLSELECT num+1 FROM cte WHERE num <100 ) SELECT * FROM cte;执行结果:
| Num |
| 1 |
| 2 |
| 3 |
|
|
| … |
7.6.2 遍历字符串里的每个值
需求:打印出ename为’King’的名字里每一个字母,每个字母占一行.
解决方法:通过自增表和emp表先cross join(笛卡尔积),然后再通过ename的len(ename的长度)进行过滤,最终得到显示每个字母的结果.
SQLServer:
SELECT SUBSTRING (e.ENAME,seq.pos,1) AS ename_Split FROM (SELECT ENAME FROM emp WHERE ename= 'KING' ) e, (SELECT number AS pos FROM master.[dbo].[spt_values] WHERE type = 'P' AND number>0)seq WHERE seq.pos <= LEN(e.ename)执行结果:
| ename_Split |
| K |
| I |
| N |
| G |
注:
1: 这里master.[dbo].[spt_values]是一张特殊的系统视图,里面存了从0到2047总2048条自增序列.
2: 如果不明白,可以分段来看.
Step1:
SELECT e.*,seq.* FROM (SELECT ENAME FROM emp WHERE ename= 'KING' ) e, (SELECT number AS pos FROM master.[dbo].[spt_values] WHERE type = 'P' AND number>0)seq执行结果:
| ENAME | pos |
| KING | 1 |
| KING | 2 |
| KING | 3 |
|
| |
| … | … |
Sept2:
SELECT SUBSTRING(e.ENAME,seq.pos,1)AS ename_Split FROM .. e, ..seq WHERE seq.pos<=LEN(e.ename)这里通过SUBSTRNG函数,每次的开始位置不断调整,每次仅取一个字符,再通过LEN函数过滤.所以得到最终结果.如果不熟悉SUBSTRING的语法,这里简单介绍下.
SUBSTRING ( expression, start, length )
1) 参数expression是要截取的原始字符串,比如这里的” KING”
2) 参数start是要截取的位置,比如从第2个位置开始,那应该从” I”往后数.
3) 参数length是要截取的长度,沿用上一行的例子,如果长度定义为2,则最终截取字符串是”IN”
Oracle:
WITH t(num) AS (SELECT 1 FROM DUALUNION ALLSELECT t.num+1FROM t WHEREt.num<100) SELECT SUBSTR(e.ENAME,seq.num,1) AS ename_Split FROM (SELECT ENAME FROM emp WHERE ename ='KING' ) e, (SELECT num FROM t)seq WHERE seq.num <= LENGTH(e.ename)Mysql 8.0:
WITH RECURSIVE cte (num)AS (SELECT 1UNION ALLSELECT num+1 FROM cte WHERE num <100 ) SELECT SUBSTRING(e.ENAME,seq.num,1) AS ename_Split FROM (SELECT ENAME FROM emp WHERE ename ='KING' ) e, (SELECT num FROM cte)seq WHERE seq.num <= LENGTH(e.ename)
7.6.3 处理含引号的字符串
需求:往dept表里插入dname为Test’s,loc为Beijing,deptno为100的数据.
解决方法:这里有位引号是特殊符号,所以需要特殊处理,比如如果双引号包裹起来.
Mysql:
INSERT INTO dept VALUES(100,'Test\'s','Beijing');Sql Server:
BEGIN TRANSET IDENTITY_INSERTdeptON;GOINSERT INTO dept(deptno,dname,loc) VALUES (100,'Test''s','Beijing');SELECT * FROM dept WHERE deptno=100;SET IDENTITY_INSERTdeptOFF;GO ROLLBACK TRAN执行结果:
| deptno | dname | loc |
| 100 | Test's | Beijing |
7.6.4 计算某个字符出现的次数
需求:查询emp表emptno是7499的用户的job里S出现的次数.
解决方法:这里length(len)结合replace函数算出字符串出现的次数.
Sql Server:
SELECT empno,job,(LEN(JOB) - LEN(REPLACE(JOB,'S','')))/LEN('S') AS StrFreq FROM emp WHERE empno=7499;| empno | job | StrFreq |
| 7499 | SALESMAN | 2 |
Mysql:
SELECT empno,job,ROUND((LENGTH(JOB) - LENGTH(REPLACE(JOB,'S','')))/LENGTH('S')) AS StrFreq FROM emp WHERE empno=7499;注:这里除以LENGTH('S')是为了考虑传入的字符串是2位以及以上的情况,比如’SS’.
7.6.5 字符串里过滤不需要的字符
需求:过滤tmp_v视图里含数字的部分. 其中tmp_v视图的data字段的定义是emp表的ename字段拼接空格和deptno字段。
解决方法:这里通过translate函数对含数字的部分进行替换.
SQL Server:
create view tmp_v AS SELECT ename+' '+cast(deptno as varchar)as data from empSELECT data,replace(dbo.translate(data,'0123456789','@@@@@@@@@@'),'@','') as ename FROM tmp_v order by replace(dbo.translate(data,'0123456789','@@@@@@@@@@'),'@','') desc| data | ename |
| WARD 30 | WARD |
| TURNER 30 | TURNER |
| SMITH 20 | SMITH |
| SCOTT 20 | SCOTT |
| MILLER 10 | MILLER |
| MARTIN 30 | MARTIN |
| KING 10 | KING |
| JONES 20 | JONES |
| JAMES 30 | JAMES |
| FORD 20 | FORD |
| CLARK 10 | CLARK |
| BLAKE 30 | BLAKE |
| ALLEN 30 | ALLEN |
| ADAMS 20 | ADAMS |
注:这里需要参考之前章节里translate函数的实现.
Mysql:
create view tmp_v AS SELECT CONCAT(ename,' ',deptno) as data from emp SELECT data,replace(translate(data,'0123456789','@@@@@@@@@@'),'@','') as ename FROM tmp_v order by replace(translate(data,'0123456789','@@@@@@@@@@'),'@','') descOracle:
7.6.6 拆分字符串里的字符和数字
需求:过滤tmp_v视图里data字段拆分会原来的ename和deptno两个字段.
解决方法:这里通过translate、replace、repeate(replicate、rpad)函数对含数字的部分进行替换.
SQL Server:
SELECT data,replace(dbo.translate(data,'0123456789',REPLICATE('@',10)),'@','') as ename, replace(dbo.translate(data,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPLICATE('@',26)),'@','') as deptno FROM tmp_v order by replace(dbo.translate(data,'0123456789',REPLICATE('@',10)),'@','') desc| data | ename | deptno |
| WARD 30 | WARD | 30 |
| TURNER 30 | TURNER | 30 |
| SMITH 20 | SMITH | 20 |
| SCOTT 20 | SCOTT | 20 |
| MILLER 10 | MILLER | 10 |
| MARTIN 30 | MARTIN | 30 |
| KING 10 | KING | 10 |
| JONES 20 | JONES | 20 |
| JAMES 30 | JAMES | 30 |
| FORD 20 | FORD | 20 |
| CLARK 10 | CLARK | 10 |
| BLAKE 30 | BLAKE | 30 |
| ALLEN 30 | ALLEN | 30 |
| ADAMS 20 | ADAMS | 20 |
Mysql:
SELECT data,replace(translate(data,'0123456789',REPEAT('@',10)),'@','') as ename, replace(translate(data,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPEAT('@',26)),'@','') as deptno FROM tmp_v order by replace(translate(data,'0123456789',REPEAT('@',10)),'@','') desc
7.6.7 判断字符串是字符串数字型
需求:检索temp_strdata表的字段data是字符串数字类型的记录这里如果都是字符串或者数字的也符合条件.
解决方法:这里通过translate、replace、repeate(replicate、rpad)函数对含数字的部分进行替换.
SqlServer:
CREATE TABLE temp_str(data VARCHAR(1000)); INSERT INTO temp_str VALUES('SMITH20'); INSERT INTO temp_str VALUES('JONES30'); INSERT INTO temp_str VALUES('Jim#40'); INSERT INTO temp_str VALUES('50$Tom'); INSERT INTO temp_str VALUES('60:Mike'); INSERT INTO temp_str VALUES('70Cruz'); INSERT INTO temp_str VALUES('Jack'); INSERT INTO temp_str VALUES('J8oh0n');SELECT data --,dbo.translate(UPPER(data),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPLICATE('a',36)) as trans FROM temp_str WHERE dbo.translate(UPPER(data),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPLICATE('a',36))=REPLICATE('a',LEN(data))7.6.8 判断字符串含有汉字
需求:检索含有汉字的字符串.
解决方法:这里通过函数CHAR_LENGTH对比LENGTH进行对比来判断.
Mysql:
SELECT data,LENGTH(data) AS Len_data,CHAR_LENGTH(data) AS CharLen_Data,HEX(data) AS HexData FROM ( SELECT 'Hello,World,SQL'AS dataUNION ALL SELECT 'Data,Arithmetic' AS dataUNION ALL SELECT 'Science中国' AS dataUNION ALL SELECT '上S海H' AS data )A WHERE LENGTH(data) <> CHAR_LENGTH(data)执行结果:
| data | Len_data | CharLen_Data |
| Science中国 | 13 | 9 |
| 上S海H | 8 | 4 |
注:
LENGTH() returnsthe length of the string measured in bytes.
CHAR_LENGTH() returns the length of the string measured incharacters.
LENGTH:是计算字节的长度.一个汉字是算三个字符,一个数字或字母算一个字符
CHAR_LENGTH:汉字、数字、字母都算是一个字符
或者通过字符串的十六进制并结合REGEXP来判断.
SELECT data,HEX(data) AS HexData FROM ( SELECT 'Hello,World,SQL' AS dataUNION ALL SELECT 'Data,Arithmetic'AS dataUNION ALL SELECT 'Science中国' AS dataUNION ALL SELECT '上S海H' AS data )A WHERE HEX(data) REGEXP'^(..)*(E[4-9])'执行结果:
| data | HexData |
| Science中国 | 536369656E6365E4B8ADE59BBD |
| 上S海H | E4B88A53E6B5B748 |
SQL Server:
SELECT data FROM ( SELECT 'Hello,World,SQL' AS dataUNION ALL SELECT 'Data,Arithmetic' AS dataUNION ALL SELECT 'Science中国' AS dataUNION ALL SELECT '上S海H' AS data )A WHERE data LIKE '%[吖-座]%'-- 或者利用PATINDEX函数进行判断 SELECT data FROM ( SELECT 'Hello,World,SQL' AS dataUNION ALL SELECT 'Data,Arithmetic' AS dataUNION ALL SELECT 'Science中国' AS dataUNION ALL SELECT '上S海H' AS data )A WHERE PATINDEX ('%[吖-座]%',data) > 0
执行结果:
| data |
| Science中国 |
| 上S海H |
7.6.9 合并多行到一行
需求:将emp表里deptno相同的ename以逗号拼接在一起.
解决方法:这里通过字符串合并函数完成该效果.如group_concat
SQL Server:
SELECT DISTINCTdeptno, STUFF((SELECT N', '+ CAST([ename]ASVARCHAR(255))FROM emp e1WHERE e1.deptno= e2.deptnoFOR XML PATH ('')), 1, 2,'')AS StrCombine FROM emp e2执行结果:
| Deptno | StrCombine |
| 10 | CLARK, KING, MILLER |
| 20 | SMITH, JONES, SCOTT, ADAMS, FORD |
| 30 | ALLEN, WARD, MARTIN, BLAKE, TURNER, JAMES |
或者借助CTE:
WITH x(deptno,cnt,list,empno,len) AS( SELECT deptno,COUNT(*) OVER(PARTITIONBY deptno),CAST(ename AS VARCHAR(100)),empno,1 FROM emp UNION ALL SELECT x.deptno,x.cnt,CAST(x.list+','+e.ename AS VARCHAR(100)),e.empno,x.len+1 FROM emp e,x WHERE e.deptno= x.deptno AND e.empno> x.empno ) SELECT deptno,list AS StrCombine from X WHERE len=cnt ORDER BY 1步骤分析:
Step1: 首先借助COUNT(*) OVER(PARTITION BY deptno)确定每个部门里有多少员工
Step2: 借助len,初始值为1,完成自增(递归里 x.len+1)
Step3: 借助x.list+','+e.ename完成ename的拼接
Step4: 借助e.deptno = x.deptno AND e.empno >x.empno确定要递归的结果集
Step5: 查询满足条件的记录,即按deptno分组的empno数和自增序号相同的记录.
分步查看结果:
1) 查看构建的递归数据.这里以depto=10的为例:
SELECT * FROM x WHERE deptno= 10| deptno | cnt | list | empno | len |
| 10 | 3 | CLARK | 7782 | 1 |
| 10 | 3 | KING | 7839 | 1 |
| 10 | 3 | MILLER | 7934 | 1 |
| 10 | 3 | KING,MILLER | 7934 | 2 |
| 10 | 3 | CLARK,KING | 7839 | 2 |
| 10 | 3 | CLARK,MILLER | 7934 | 2 |
| 10 | 3 | CLARK,KING,MILLER | 7934 | 3 |
2) 不难发现,这里高亮处的数据是我们想要的,所以通过如下方式获取最终结果:
SELECT deptno,list AS StrCombine FROM X WHERE len=cnt ORDER BY 1延展阅读:如果仅想获得某一个分组下的字符串合并,也可以按照如下方法:
DECLARE @combinedString VARCHAR(MAX) SELECT @combinedString = COALESCE(@combinedString+', ','')+ ename FROM emp WHERE deptno=10 SELECT @combinedString as StringValueMysql:
SELECT deptno,group_concat(ename) AS StrCombine FROM emp GROUP BY deptno ORDER BY emp.deptno,ename
7.6.10 对字符串重新按字母排序重新组合
需求:将emp表里ename按照字母顺序重新组合生成新的字符.
解决方法:这里通过字符串合并函数或者结合substring和row_number完成该效果.
SqlServer:
WITH x(ename,ename_Split) AS ( SELECT TOP 100000 ename,SUBSTRING(e.ENAME,seq.pos,1) AS ename_SplitFROM (SELECT ENAME FROM emp) e,(SELECT number AS pos FROM master.[dbo].[spt_values] WHERE type= 'P' AND number>0) seqWHERE seq.pos <= LEN(e.ename)ORDER BY ename, ename_Split)SELECT DISTINCTename, STUFF((SELECT N''+ CAST(ename_SplitASVARCHAR(255))FROM x e1WHERE e1.ename= e2.enameFOR XML PATH ('')),1,0,'')AS StrByAlph FROM x e2注: 如果想在CTE里使用[]排序,需要在查询里指定TOP.
消息 1033,级别 15,状态 1,第 67 行
除非另外还指定了 TOP、OFFSET 或 FORXML,否则,ORDER BY 子句在视图、内联函数、派生表、子查询和公用表表达式中无效。
执行结果:
| ename | StrByAlph |
| ADAMS | AADMS |
| ALLEN | AELLN |
| BLAKE | ABEKL |
| CLARK | ACKLR |
| FORD | DFOR |
| JAMES | AEJMS |
| JONES | EJNOS |
| KING | GIKN |
| MARTIN | AIMNRT |
| MILLER | EILLMR |
| SCOTT | COSTT |
| SMITH | HIMST |
| TURNER | ENRRTU |
| WARD | ADRW |
|
|
|
Mysql:
SELECT ename,group_concat(SUBSTRING(e.ENAME,seq.num,1) ORDER BY SUBSTRING(e.ENAME,seq.num,1) separator '') AS StrByAlph FROM (SELECT ENAME FROM emp) e,(SELECT i AS num FROM tb_incr)seq WHERE seq.num <= LENGTH(e.ename) GROUP BY ename ORDER BY ename,SUBSTRING(e.ENAME,seq.num,1)
注: 这里借助group_concat函数里的ORDER BY关键字,对已经排序的字母进行合并.
这里tb_incr是个自增表,存放从1开始步长为1的自增序列
Oracle:
7.6.11 判断一个字符是否是数字
需求:将临时表里判断data字段里哪些是数字.
解决方法:这里通过函数isnumberic或者regexp完成该效果.
Mysql:
DELIMITER $$ DROP FUNCTION IF EXISTS `IsNum` $$ CREATE FUNCTION `IsNum`(str VARCHAR(25)) RETURNS INT BEGINDECLARE iResult INT DEFAULT 0;IF ISNULL(str) THEN return 0;END IF;-- NULL 字符串IF str='' THEN return 0;END IF;-- 空字符串SELECT str REGEXP '^[0-9]*$' INTO iResult;IF iResult =1 THENRETURN 1;ELSERETURN 0;END IF;END $$ DELIMITER ;或者使用正则表达式:
SELECT data FROM ( SELECT '63' AS data UNION ALL SELECT '36(' AS data UNION ALL SELECT '3(6' AS data UNION ALL SELECT '(36' AS data UNION ALL SELECT '36$' AS data UNION ALL SELECT '' AS data UNION ALL SELECT NULL AS data )A -- WHERE IsNum(data) = 1 WHERE data REGEXP '^[0-9]*$'=1 AND data IS NOT NULL AND data<> '';或者直接通过函数IsNum(data) = 1来判断,见注释部分.
SqlServer:
SELECT data FROM ( SELECT '63' AS data UNION ALL SELECT '36(' AS data UNION ALL SELECT '3(6' AS data UNION ALL SELECT '(36' AS data UNION ALL SELECT '36$'AS data UNION ALL SELECT '' AS data UNION ALL SELECT NULL AS data )A WHERE ISNUMERIC(data)= 1执行结果:
| Data |
| 63 |
7.6.12 按照指定的位置截取字符
需求:按照逗号拆分字符串,取拆分出来的第二个子串.
解决方法:这里需要自定义函数结合substring截取字符串,以达到该效果.
SQL Server:
CREATE FUNCTION strSplitIndex ( @str VARCHAR(1024), --要分割的字符串@split VARCHAR(10), --分隔符@index INT --要取元素的位置 ) RETURNS VARCHAR(1024) AS BEGIN DECLARE @location INT DECLARE @start INT DECLARE @next INT DECLARE @seed INT SET @str=LTRIM(RTRIM(@str)) SET @start=1 SET @next=1 SET @seed=LEN(@split) SET @location=CHARINDEX(@split,@str) WHILE @location<>0and @index>@next BEGIN SET @start=@location+@seed SET @location=CHARINDEX(@split,@str,@start) SET @next=@next+1 END IF @location =0 SELECT @location =LEN(@str)+1 RETURN SUBSTRING(@str,@start,@location-@start) END GO SELECT dbo.strSplitIndex(data,',',1) AS StrSplit FROM (SELECT 'Hello,World,SQL' AS dataUNION ALLSELECT 'Data,Arithmetic' AS dataUNION ALLSELECT 'Science' AS data)A执行结果:
| StrSplit |
| Hello |
| Data |
| Science |
或者借助parsename函数:
SELECT PARSENAME(REPLACE(data,',','.'),2) AS StrSplit FROM (SELECT 'Hello,World,SQL' AS dataUNION ALLSELECT 'Data,Arithmetic' AS dataUNION ALLSELECT 'Science' AS data )A WHERE PARSENAME(REPLACE(data,',','.'),2)IS NOT NULL执行结果:
| StrSplit |
| Hello |
| Data |
Mysql:
SELECT data,SUBSTRING_INDEX(SUBSTRING_INDEX(data,',',seq.num),',',-1) AS sub,seq.num AS subStrPos FROM (SELECT 'Hello,World,SQL' AS dataUNION ALL SELECT 'Data,Arithmetic' AS dataUNION ALL SELECT 'Science' AS data) e, (SELECT ias num FROM tb_incr)seq WHERE seq.num <= LENGTH(e.data)- LENGTH(REPLACE(e.data,',',''))+1 AND seq.num=2 ORDER BY data,seq.num执行结果:
| data | sub | subStrPos |
| Data,Arithmetic | Arithmetic | 2 |
| Hello,World,SQL | World | 2 |
步骤解析:
Step1: 首先借助自增表将data字段里的数据按照逗号的数目切分,如果有2个逗号,则会切分成3部分
Step2: 借助SUBSTRING_INDEX函数截取逗号所在位置的子串,这里鉴于SUBSTRING_INDEX的第三个参数的意义是子串累加,所以又套了个SUBSTRING_INDEX,第三个参数传-1,即从右边截取.
Step3:借助自增表的num,取指定分割位置的数据,这里是2.
注: SUBSTRING_INDEX函数执行示例见下:
SELECT SUBSTRING_INDEX('Hello,World,SQL',',',1) AS SUBSTRING,1 AS pos UNION ALL SELECT SUBSTRING_INDEX('Hello,World,SQL',',',2) AS SUBSTRING,2 AS pos UNION ALL SELECT SUBSTRING_INDEX('Hello,World,SQL',',',3) AS SUBSTRING,3 AS pos执行结果:
| SUBSTRING | pos |
| Hello | 1 |
| Hello,World | 2 |
| Hello,World,SQL | 3 |
Oracle:
7.6.13 按照指定的分隔符截取字符返回表形式
需求:按照逗号拆分字符串,并指定返回的格式是表.
解决方法:这里需要自定义函数结合substring截取字符串,以达到该效果.
SQLServer:
CREATE FUNCTION strSplitTable(@strNVARCHAR(2000),@splitNVARCHAR(2)) RETURNS @t TABLE(SubStr VARCHAR(1000)) AS BEGIN DECLARE @tmpSubStr VARCHAR(1000),@getIndexINT SET @getIndex=CHARINDEX(',',@str) WHILE(@getIndex<>0) BEGINSET @tmpSubStr=CONVERT(VARCHAR(1000),SUBSTRING(@str,1,@getIndex-1))INSERT INTO @t(SubStr)VALUES(@tmpSubStr)SET @str=STUFF(@str,1,@getIndex,'')SET @getIndex=CHARINDEX(',',@str) END INSERT INTO @t(SubStr)VALUES(@str) RETURN END GO SELECT * FROM strSplitTable('Hello,World,SQL',',')执行结果:
| SubStr |
| Hello |
| World |
| SQL |
注: 自Sql Server 2016已新增系统函数STRING_SPLIT,测试示例见下:
SELECT A.value value_A,B.value value_B FROM ( SELECT value FROM STRING_SPLIT('A$B$C','$') )A LEFT JOIN (SELECT value FROM STRING_SPLIT('A,B',',') )B ON A.value = B.value执行结果:
| value_A | value_B |
| A | A |
| B | B |
| C | NULL |
这里仅支持单个字符的分隔符.
消息 214,级别 16,状态 11,第 12 行
Procedure expects parameter 'separator' of type 'nchar(1)/nvarchar(1)'.
Mysql:
未见该方法.
总结
以上是生活随笔为你收集整理的SQL基础操作_5_字符串处理的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: A股工业4.0板块的股票 作为投资者得明
- 下一篇: SQL基础操作_3_数据字典(涵盖SQL