欢迎访问 生活随笔!

生活随笔

当前位置: 首页 >

数据中台-数据采集

发布时间:2024/3/12 47 豆豆
生活随笔 收集整理的这篇文章主要介绍了 数据中台-数据采集 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

数据导图

采集不同数据库库、表、字段、函数、外键、存储过程等,为后续清洗做准备

数据采集

1. 获取数据库所有表

SELECT TABLE_NAME AS TB_NAME,TABLE_NAME AS EN_NAME,TABLE_COMMENT COMMENTS,CREATE_TIME AS CREATE_TIMEFROM INFORMATION_SCHEMA.TABLESWHERE table_type = 'BASE TABLE' AND TABLE_SCHEMA = (SELECT DATABASE())

2. 获取数据库所有字段

SELECT ORDINAL_POSITION AS RN,TABLE_NAME AS TB_NAME,COLUMN_NAME AS EN_NAME,DATA_TYPE AS DATA_TYPE,CASE WHEN data_type = 'float' OR data_type = 'double' OR data_type = 'decimal' THEN NUMERIC_PRECISION ELSE CHARACTER_MAXIMUM_LENGTH END AS DATA_LENGTH,NUMERIC_SCALE AS PRECISION_LENGTH,CASE EXTRA WHEN 'auto_increment' THEN 1 ELSE 0 END AS IS_IDENTITY,CASE WHEN COLUMN_KEY = 'PRI' THEN 1 ELSE 0 END AS IS_PK,CASE WHEN IS_NULLABLE = 'NO' THEN 0 ELSE 1 END AS IS_NULL,COLUMN_DEFAULT AS DEFAULT_VALUE,COLUMN_COMMENT AS COMMENTSFROM information_schema.columnsWHERE table_schema = (SELECT DATABASE()) ORDER BY TB_NAME,RN

3. 获取表所有的外键

SELECTC.TABLE_SCHEMA,C.REFERENCED_TABLE_NAME FK_CONN_TABLE,C.REFERENCED_COLUMN_NAME FK_CONN_COLUMNS,C.TABLE_NAME TB_NAME,C.COLUMN_NAME FK_COLUMN,C.CONSTRAINT_NAME FK_NAME,T.TABLE_COMMENT REMARK,R.UPDATE_RULE UPDATE_ACTION,R.DELETE_RULE DELETE_ACTIONFROMINFORMATION_SCHEMA.KEY_COLUMN_USAGE CJOIN INFORMATION_SCHEMA.TABLES T ON T.TABLE_NAME = C.TABLE_NAMEJOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS R ON R.TABLE_NAME = C.TABLE_NAMEAND R.CONSTRAINT_NAME = C.CONSTRAINT_NAMEAND R.REFERENCED_TABLE_NAME = C.REFERENCED_TABLE_NAMEWHEREC.REFERENCED_TABLE_NAME IS NOT NULLAND C.table_schema = (SELECT DATABASE())

4. 获取表所有的索引

SELECT TABLE_NAME TB_NAME,INDEX_NAME,GROUP_CONCAT(COLUMN_NAME) AS INDEX_COLUMNS,SUBSTRING_INDEX(GROUP_CONCAT(INDEX_COMMENT), ',', 1) AS COMMENTS,SUBSTRING_INDEX(GROUP_CONCAT(INDEX_TYPE), ',', 1) AS INDEX_TYPEFROM INFORMATION_SCHEMA.STATISTICSWHERE TABLE_SCHEMA = (SELECT DATABASE()) AND INDEX_NAME != 'PRIMARY'GROUP BY TABLE_NAME, INDEX_NAME

5. 获取表所有的触发器

SELECT EVENT_OBJECT_TABLE AS TB_NAME,TRIGGER_NAME AS TRI_NAME,ACTION_TIMING AS TRI_TIME,EVENT_MANIPULATION AS TRI_ENVENT,ACTION_STATEMENT AS SQL_STRFROM information_schema.TRIGGERSWHERE trigger_schema = (SELECT DATABASE())

6. 获取数据库所有的视图信息

SELECT a.TABLE_NAME VIEW_NAME,a.TABLE_NAME AS EN_NAME,a.TABLE_COMMENT COMMENTS,a.CREATE_TIME AS CREATE_TIME,b.VIEW_DEFINITION AS SQL_STRFROM INFORMATION_SCHEMA.TABLES a LEFT JOININFORMATION_SCHEMA.VIEWS b ON a.table_NAME = b.TABLE_NAMEWHERE a.table_type = 'VIEW' AND a.TABLE_SCHEMA = (SELECT DATABASE())

7. 获取数据库所有的函数信息

SELECT ROUTINE_NAME AS FUNC_NAME,ROUTINE_NAME AS EN_NAME,ROUTINE_DEFINITION AS SQL_STRFROM INFORMATION_SCHEMA.ROUTINESWHERE ROUTINE_TYPE = 'FUNCTION'AND Routine_SCHEMA = (SELECT DATABASE())

8. 获取数据库所有的存储过程

SELECT ROUTINE_NAME AS PROC_NAME,ROUTINE_NAME AS EN_NAME,ROUTINE_DEFINITION AS SQL_STRFROM INFORMATION_SCHEMA.ROUTINESWHERE ROUTINE_TYPE = 'PROCEDURE'AND Routine_SCHEMA = (SELECT DATABASE())

实践脚本

代码解析

  • PigConstant
public final class PigConstant {/*** 数据库*/public static String PIG_MYSQL = "query/10_mysql.xml";public static String PIG_ORACLE = "query/11_oracle.xml";public static String PIG_POSTGRESQL = "query/12_postgresql.xml";public static String PIG_GREENPLUM = "query/12_greenplum.xml";public static String PIG_SQLSERVER = "query/13_sqlserver.xml";public static String PIG_DM = "query/14_dm.xml";public static String PIG_KINGBASE8 = "query/15_kingbase8.xml";public static String PIG_OSCAR = "query/16_oscar.xml";/*** 查询*/public static String PIG_1_DB_TABLE_SQL = "1dbTableSql";public static String PIG_2_DB_TABLEFIELD_SQL = "2dbTableFieldSql";public static String PIG_3_DB_TABLEFK_SQL = "3dbTableFkSql";public static String PIG_4_DB_TABLEINDEX_SQL = "4dbTableIndexSql";public static String PIG_5_DB_TABLETRI_SQL = "5dbTableTriSql";public static String PIG_6_DB_VIEW_SQL = "6dbViewSql";public static String PIG_7_DB_FUNCTION_SQL = "7dbFunctionSql";public static String PIG_8_DB_PROCEDURE = "8dbProcedure";}
  • IPigSqlQuery
public interface IPigSqlQuery {Object getDbQuery(String name); }
  • PigSqlQuery
import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.Element; import org.dom4j.io.SAXReader; import org.springframework.core.io.ClassPathResource;import java.io.IOException; import java.io.InputStream; import java.util.HashMap; import java.util.List; import java.util.Map;public class PigSqlQuery implements IPigSqlQuery {private static Map<String, Object> dbQuery = new HashMap<>();public PigSqlQuery(String configXml) {InputStream is = null;Document document = null;try {SAXReader reader = new SAXReader();ClassPathResource classPathResource = new ClassPathResource(configXml);is = classPathResource.getInputStream();document = reader.read(is);is.close();List<Element> elements = document.getRootElement().elements();for (Element element : elements) {String id = element.attributeValue("id");//获取id属性String content = element.getText();dbQuery.put(id, content);}} catch (IOException e) {e.printStackTrace();} catch (DocumentException e) {e.printStackTrace();} finally {if (is != null) {try {is.close();} catch (IOException e) {e.printStackTrace();}}if (document != null) {document.clone();}}}/*** 根据名字来获取实例** @param name* @return*/@Overridepublic String getDbQuery(String name) {String sql = (String) dbQuery.get(name);return sql;} }

总结

以上是生活随笔为你收集整理的数据中台-数据采集的全部内容,希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。