i collecting , graphing data , 1 of things need calculate exponential moving average of data. have data stored in postgres.

based on stack page read (how calculate exponential moving average on postgres?) have following function.

create or replace function ema_func( state double precision, inval double precision, alpha double precision) returns double precision $body$ begin   return case      when state null inval      else alpha * inval + (1-alpha) * state      end; end $body$ language plpgsql volatile

which using aggregate put together:

create aggregate ema(double precision, double precision) (   sfunc=ema_func,   stype=float8 );

i graphing stock information given day have 7000-8000 pieces of data. don't need of information graph data (depending on window settings, 1 pixel might worth somewhere around 60 seconds) want pull snapshots of data every nth second. wrote function me , has saved me time.

create or replace function emasnapshots(     ptable varchar,     timestart timestamptz,     timeend timestamptz,     duration double precision,     psymbol varchar,     alpha double precision)     returns setof timevalue $body$     declare         localstart timestamptz;         localend timestamptz;         timevalues timevalue%rowtype;         groups int := ceil((select extract(epoch (timeend - timestart))) / duration);     begin         execute 'create temp table allemas on commit drop select datetime, ema(value, ' || quote_literal(alpha) || ') on (order datetime asc) ' || quote_ident(ptable) || ' symbol = ' || quote_literal(psymbol) || ' , datetime >= ' || quote_literal(timestart) || ' , datetime <= ' || quote_literal(timeend);         in 1 .. groups loop             localstart := timestart + (duration * (i - 1) * interval '1 second');             localend := timestart + (duration * * interval '1 second');             execute 'select * allemas datetime >= ' || quote_literal(localstart) || ' , datetime <= ' || quote_literal(localend) || ' order datetime desc limit 1' timevalues;             return next timevalues;         end loop;     return;     end $body$ language plpgsql volatile

running ema with

select datetime::timestamptz, ema(value, 0.0952380952380952380952380952381 /* alpha */) on (order datetime asc) "4" symbol = 'aapl' , datetime >= '2015-07-01 7:30' , datetime <= '2015-07-01 14:00:00'

takes 1.5 seconds collect of data (7733 rows) , push across internet (my data in state)

running emasnapshot function wrote

select start, average emasnapshots ('4', '2015-07-01 9:30-4', '2015-07-01 16:00-4', 60, 'aapl', 0.0952380952380952380952380952381);

takes 0.5 seconds gather of data , push across internet (390 rows) btw, clarity. pulling table "4" july 1st during stock market hours , want snapshots every 60 seconds. last number alpha , means calculating 20 second emas (alpha = 2/(period + 1))

my question is, doing fastest way possible? there way tell part of function slower part? temp table creation or grabbing snapshot part? should selecting recent date in interval different way? should selecting latest time in interval original table (which indexed on time) , joining newly created table?

i started writing postgres functions week ago. realize newly created table not indexed , might take bit longer date-related stuff asking do. there way around this? dealing lots of days worth of data lots of different symbols not sure creating ema tables possibilities idea. don't want suck of data down , processing locally because if graphing software has multiple days open, encompass 35,000 lines have transferred processed.

btw, don't think indexing speed or because can run:

select * "4" symbol = 'aapl' , datetime >= '2015-07-01 07:30' , datetime <= '2015-07-01 14:00' order datetime asc limit 450

and response in under 150ms on internet. there way less processing in though.

thank time!

edited based on patrick's answer.

i have query below modified patrick said:

select datetime, ema (     select datetime, ema, rank() on (partition bucket order datetime desc) = 1 rank         (         select datetime, ema(value, 0.0952380952380952380952380952381) on (order datetime asc) ema,              ceil(extract(epoch (datetime - '2015-07-01 7:30')) / 60) bucket         "4"          symbol = 'aapl'          , datetime between '2015-07-01 7:30' , '2015-07-01 14:00' ) x ) y rank = true;

because getting error couldn't put rank statement in clause split different select statement, did right? having 3 select statements feels odd me sql newbie , trying learn go maybe isn't bad.

my explain statement above query looks this.

subquery scan on y  (cost=6423.35..6687.34 rows=4062 width=16)   filter: y.rank    ->  windowagg  (cost=6423.35..6606.11 rows=8123 width=24)     ->  sort  (cost=6423.35..6443.65 rows=8123 width=24)           sort key: x.bucket, x.datetime           ->  subquery scan on x  (cost=5591.23..5895.85 rows=8123 width=24)                 ->  windowagg  (cost=5591.23..5814.62 rows=8123 width=16)                       ->  sort  (cost=5591.23..5611.54 rows=8123 width=16)                             sort key: "4".datetime                             ->  bitmap heap scan on "4"  (cost=359.99..5063.74 rows=8123 width=16)                                   recheck cond: (((symbol)::text = 'aapl'::text) , (datetime >= '2015-07-01 07:30:00-06'::timestamp time zone) , (datetime <= '2015-07-01 14:00:00-06'::timestamp time zone))                                   ->  bitmap index scan on "4_pkey"  (cost=0.00..357.96 rows=8123 width=0)                                         index cond: (((symbol)::text = 'aapl'::text) , (datetime >= '2015-07-01 07:30:00-06'::timestamp time zone) , (datetime <= '2015-07-01 14:00:00-06'::timestamp time zone))

first, few notes minor efficiency issues function:

you not have quote_literal() other strings. impossible bobby tables injected sql statement through double precision or timestamp parameter.
in dynamic sql statement have manually splice in table , column names; parameter values can injected using using clause. saves lot of parsing time.
move many calculations possible outside of loop. instance:

declare   ...   dur_int interval := duration * interval '1 second';   localstart timestamptz := timestart;   localend timestamptz := localstart + dur_int; begin   ...   in 1 .. groups loop     ...     localstart := localstart + dur_int;     localend := localend + dur_int;   end loop;   ...

but moot...

in code first populate temp table 7733 rows of data, later extract 1 record @ time dynamic query in loop runs 390 times. very, wasteful. can replace entire function body single statement:

return query execute format('select datetime, ema '    'from ('      'select datetime, ema, '             'rank() on (partition bucket order datetime desc) rank '      'from ('        'select datetime, ema(value, $1) on (order datetime asc) ema, '               'ceil(extract(epoch (datetime - $2)) / $3) bucket '        'from %i '         'where symbol = $4 '          'and datetime between $2 , $5) x '      'where rank = 1) y '    'order 1', ptable) using alpha, timestart, duration, psymbol, timeend;

the principle here in innermost query calculate "bucket" each processed row table fall. in next level query calculate rank of rows in each bucket based on datetime. in main query pick recent row each bucket, i.e. 1 rank = 1.

on speed: should explain on queries on server, rather measure on client includes network transfer time.

Search This Blog

Running

function - Quickest way to return snapshots of Exponential Moving Average in postgresql -

edited based on patrick's answer.

Comments

Post a Comment

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

python - PIL cannot identify image file for io.BytesIO object -

java - disabling a node without disable its children in javafx -