Greenplum数据加载方式(1) – insert 和 copy
4,835 total views, 1 views today
greenplum数据加载主要包括insert、copy、外部表、gpload、web external table等五种方式。
其中insert和copy是串行;外部表gpfdist和gpload工具是并行方式。
1、insert
这种加载方式和其他数据库SQL语法一样,但是效率最差,只适合加载极少数数据。需要通过master节点操作。
<1>直接在plsq客户端中执行insert语句
注意,插入空值,需用null表示,这点和oracle不同。如,
1 2 3 4 5 |
testDB=# insert into t values('','szlsd'); ERROR: invalid input syntax for integer: "" testDB=# testDB=# insert into t values(null,'szlsd'); INSERT 0 1 |
<2>批量insert,类似于oracle的@t.sql
1 |
$ psql testDB -f /home/gpadmin/script/t.sql |
或者psql登录后执行
1 |
testDB=# \i /home/gpadmin/script/t.sql |
2、copy方式
比SQL方式效率大大提升,但是数据依然需要通过master节点,无法实现并发高效数据加载。
将member.txt文件中的数据导入到t表中(注意,如果文件中有空行,则会引起导入失败)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
$ more member.txt 10 , jj 11 , kk 2 , tom 3 , cc 4 , dd 5 , ee 6 , ff 7 , gg 8 , hh 9 , ii 10 , jj 11 , kk 2 , tom 3 , cc 4 , dd 5 , ee 6 , ff 7 , gg 8 , hh 9 , ii 10 |
copy方式加载数据
1 |
testDB=# copy t(id,name) from '/home/gpadmin/script/member.txt' with delimiter ','; |
copy语法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
COPY table [(column [, ...])] FROM {'file' | STDIN} [ [WITH] [OIDS] [HEADER] [DELIMITER [ AS ] 'delimiter'] [NULL [ AS ] 'null string'] [ESCAPE [ AS ] 'escape' | 'OFF'] [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF'] [CSV [QUOTE [ AS ] 'quote'] [FORCE NOT NULL column [, ...]] [FILL MISSING FIELDS] [[LOG ERRORS [INTO error_table] [KEEP] SEGMENT REJECT LIMIT count [ROWS | PERCENT] ] COPY {table [(column [, ...])] | (query)} TO {'file' | STDOUT} [ [WITH] [OIDS] [HEADER] [DELIMITER [ AS ] 'delimiter'] [NULL [ AS ] 'null string'] [ESCAPE [ AS ] 'escape' | 'OFF'] [CSV [QUOTE [ AS ] 'quote'] [FORCE QUOTE column [, ...]] ] [IGNORE EXTERNAL PARTITIONS ] |
常用参数
分隔符:[DELIMITER [ AS ] ‘delimiter’]
处理空列(含有空格符的是不行的):[NULL [ AS ] ‘null string’]
记录错误数据,错误日志表自动创建: [[LOG ERRORS [INTO error_table] [KEEP]
允许错误的行数,大于指定值导入失败全部回滚:SEGMENT REJECT LIMIT count [ROWS | PERCENT] ]