sql server - Best method to merge two sql tables toegether -
so have 2 tables. 1 tracking a persons location, , 1 has shifts of staff members.
staff members have staffid, location, start , end times, , cost of shift.
people have eventid, stayid, personid, location, start , end time. person have event multiple stays.
what attempting mesh these 2 tables together, can accurately report cost of each location stay, based on duration of stay multiplied associated cost of staff covering location @ time.
the issues have are:
- location stays not align staff shifts. i.e. person might in location between 1pm , 2pm, , 4 staff might on shifts 12:30 1:30, , 2 on 1:30 till 5.
- there lot of records.
- not staff paid same
my current method expand both tables have record every single minute. stay between 1pm , 2pm have 60 records, , staff shift goes 5 hours have 300 records. can take staff working on location @ minute minute value based on cost of each staff member divided duration of shift, , apply value corresponding record in other table.
techniques used:
- i create table 50,000 numbers, since stays can quite long.
- i take staff table , join onto numbers table split each shift. group based on location , minute, staff count , minute cost.
- the final step, , 1 causing issues, take location table, join onto numbers, , onto modified staff table produce cost minute. count number of people in location account staff covering multiple people.
i'm finding process extremely slow can imagine, since person table has 500 million records when expanded minute level, , staff table has 35 million when same thing done.
can people suggest better method me use?
sample data: locations
| eventid | id | person | loc | start | end | 1 | 987 | 123 | 1 | may, 20 2015 07:00:00 | may, 20 2015 08:00:00 | 1 | 374 | 123 | 4 | may, 20 2015 08:00:00 | may, 20 2015 10:00:00 | 1 | 184 | 123 | 3 | may, 20 2015 10:00:00 | may, 20 2015 11:00:00 | 1 | 798 | 123 | 8 | may, 20 2015 11:00:00 | may, 20 2015 12:00:00
staff
| loc | staffid | cost | start | end | 1 | 99 | 40 | may, 20 2015 04:00:00 | may, 20 2015 12:00:00 | 1 | 15 | 85 | may, 20 2015 03:00:00 | may, 20 2015 5:00:00 | 3 | 85 | 74 | may, 20 2015 18:00:00 | may, 20 2015 20:00:00 | 4 | 10 | 36 | may, 20 2015 06:00:00 | may, 20 2015 14:00:00
result
| eventid | id | person | loc | start | end | cost | 1 | 987 | 123 | 1 | may, 20 2015 07:00:00 | may, 20 2015 08:00:00 | 45.50 | 1 | 374 | 123 | 4 | may, 20 2015 08:00:00 | may, 20 2015 10:00:00 | 81.20 | 1 | 184 | 123 | 3 | may, 20 2015 10:00:00 | may, 20 2015 11:00:00 | 95.00 | 1 | 798 | 123 | 8 | may, 20 2015 11:00:00 | may, 20 2015 12:00:00 | 14.75
sql: numbers table
;with x ( select top (224) object_id sys.all_objects ) select top (50000) n = row_number() on (order x.object_id) #numbers x cross join x y order n
staff table
select location, isnull(sum(round(cost/ case when (datediff(minute, startdatetime, enddatetime)) = 0 1 else (datediff(minute, startdatetime, enddatetime)) end, 5)),0) minutecost, count(name) staffcount, rosterminute = dateadd(mi, datediff(mi, 0, startdatetime) + n.n -1, 0) #temp_staffroster dbo.staffroster
grouping together, , needed think
insert dbo.finaltable select [eventid] ,[id] ,[start] ,[end] ,event.[location] ,sum(isnull(minutecost,1)/isnull(peoplecount, 1)) cost ,avg(isnull(staffcount,1)) avgstaff dbo.events event (nolock) inner join #numbers n on n.n between 0 , datediff(minute, start, end) left outer join #temp_staffroster staff (nolock) on staff.location= event.location , staff.rosterminute = dateadd(mi, datediff(mi, 0, start) + n.n -1 , 0) left outer join (select [location], dateadd(mi, datediff(mi, 0, start) + n.n -1 , 0) mins, count(id) peoplecount dbo.events (nolock) inner join #numbers n on n.n between 0 , datediff(minute, start, end) group [location], dateadd(mi, datediff(mi, 0, start) + n.n -1 , 0) ) cap on cap.location= event.locationand cap.mins = dateadd(mi, datediff(mi, 0, start) + n.n -1 , 0) group [eventid] ,[id] ,[start] ,[end] ,event.[location]
update
so have 2 tables. 1 tracking a persons location, , 1 has shifts of staff members cost. attempting consolidate 2 tables calculate cost of each location stay.
here method:
;;with stay ( select top 650000 stayid, location, start, end stg_stay loction not null -- locations don't have matching shift location order location, adtm ), shift ( select top 36000000 location, shiftminute, minutecost, staffcount stg_shifts order location, shiftminute ) select [stayid], sum(minutecost) cost, avg(staffcount) staffcount newtable stay s cross apply (select minutecost, staffcount shift r r.location = s.location , r.shiftminute between s.start , s.end ) shifts group [stayid]
this i'm at.
i've split shifts table minute minute level since there no clear alignment of shifts stays.
stg_stay contains more columns needed operation. stg_shift shown.
indexes used on stg_shifts:
create nonclustered index ix_shifts_loc_min on dbo.stg_shifts (location, shiftminute) include (minutecost, staffcount);
on stg_stay
create index ix_stay_stayid on dbo.stg_stay (stayid); create clustered index ix_stay_start_end_loc on dbo.stg_stay (location,start,end);
due fact shifts has ~36 million records , stays has ~650k, can make perform better?
- don't break down rows minutes.
- staging table may if can create fast relationship between them. i.e. overlapped interval
select * locations l outer apply -- assume staff won't appear in different location in same period of time, of course. ( select convert(decimal(14,2), sum(costperminute * overlappedminutes)) actualcost, count(distinct staffid) staffcount, sum(overlappedminutes) staffminutes ( select *, -- calculate overlapped time in minutes datediff(minute, case when starttime > l.starttime starttime else l.starttime end, -- greatest start time case when endtime > l.endtime l.endtime else endtime end -- least end time ) overlappedminutes, cost / datediff(minute, starttime, endtime) costperminute staff locationid = l.locationid , starttime <= l.endtime , l.starttime <= endtime -- match overlapped time ) data ) staffinloc
Comments
Post a Comment