sql server - Best method to merge two sql tables toegether -


so have 2 tables. 1 tracking a persons location, , 1 has shifts of staff members.

staff members have staffid, location, start , end times, , cost of shift.

people have eventid, stayid, personid, location, start , end time. person have event multiple stays.

what attempting mesh these 2 tables together, can accurately report cost of each location stay, based on duration of stay multiplied associated cost of staff covering location @ time.

the issues have are:

  1. location stays not align staff shifts. i.e. person might in location between 1pm , 2pm, , 4 staff might on shifts 12:30 1:30, , 2 on 1:30 till 5.
  2. there lot of records.
  3. not staff paid same

my current method expand both tables have record every single minute. stay between 1pm , 2pm have 60 records, , staff shift goes 5 hours have 300 records. can take staff working on location @ minute minute value based on cost of each staff member divided duration of shift, , apply value corresponding record in other table.

techniques used:

  1. i create table 50,000 numbers, since stays can quite long.
  2. i take staff table , join onto numbers table split each shift. group based on location , minute, staff count , minute cost.
  3. the final step, , 1 causing issues, take location table, join onto numbers, , onto modified staff table produce cost minute. count number of people in location account staff covering multiple people.

i'm finding process extremely slow can imagine, since person table has 500 million records when expanded minute level, , staff table has 35 million when same thing done.

can people suggest better method me use?

sample data: locations

| eventid |  id | person | loc |          start         |         end |  1      | 987 |  123   |  1  | may, 20 2015 07:00:00 | may, 20 2015 08:00:00  |  1      | 374 |  123   |  4  | may, 20 2015 08:00:00 | may, 20 2015 10:00:00  |  1      | 184 |  123   |  3  | may, 20 2015 10:00:00 | may, 20 2015 11:00:00  |  1      | 798 |  123   |  8  | may, 20 2015 11:00:00 | may, 20 2015 12:00:00  

staff

| loc | staffid | cost |         start         |         end |  1  | 99      |  40  | may, 20 2015 04:00:00 | may, 20 2015 12:00:00  |  1  | 15      |  85  | may, 20 2015 03:00:00 | may, 20 2015 5:00:00  |  3  | 85      |  74  | may, 20 2015 18:00:00 | may, 20 2015 20:00:00  |  4  | 10      |  36  | may, 20 2015 06:00:00 | may, 20 2015 14:00:00  

result

| eventid | id | person | loc | start | end | cost | 1 | 987 | 123 | 1 | may, 20 2015 07:00:00 | may, 20 2015 08:00:00 | 45.50 | 1 | 374 | 123 | 4 | may, 20 2015 08:00:00 | may, 20 2015 10:00:00 | 81.20 | 1 | 184 | 123 | 3 | may, 20 2015 10:00:00 | may, 20 2015 11:00:00 | 95.00 | 1 | 798 | 123 | 8 | may, 20 2015 11:00:00 | may, 20 2015 12:00:00 | 14.75

sql: numbers table

;with x  (   select top (224) object_id  sys.all_objects  ) select top (50000) n = row_number() on (order x.object_id)  #numbers x cross join x y  order n 

staff table

select      location,     isnull(sum(round(cost/ case when (datediff(minute, startdatetime, enddatetime)) = 0 1 else (datediff(minute, startdatetime, enddatetime)) end, 5)),0) minutecost,     count(name) staffcount,     rosterminute = dateadd(mi, datediff(mi, 0, startdatetime) + n.n -1,     0)  #temp_staffroster dbo.staffroster 

grouping together, , needed think

    insert dbo.finaltable     select [eventid]           ,[id]           ,[start]           ,[end]           ,event.[location]           ,sum(isnull(minutecost,1)/isnull(peoplecount, 1)) cost           ,avg(isnull(staffcount,1)) avgstaff       dbo.events event (nolock)        inner join #numbers n on n.n between 0 ,  datediff(minute, start, end)       left outer join #temp_staffroster staff (nolock) on staff.location= event.location , staff.rosterminute = dateadd(mi, datediff(mi, 0, start) + n.n -1 ,     0)       left outer join (select [location], dateadd(mi, datediff(mi, 0, start) + n.n -1 ,     0) mins, count(id) peoplecount                        dbo.events (nolock)                         inner join #numbers n on n.n between 0 ,  datediff(minute, start, end)                        group [location], dateadd(mi, datediff(mi, 0, start) + n.n -1 ,     0)                        ) cap on cap.location= event.locationand cap.mins = dateadd(mi, datediff(mi, 0, start) + n.n -1 ,     0)        group [eventid]               ,[id]               ,[start]               ,[end]               ,event.[location] 

update

so have 2 tables. 1 tracking a persons location, , 1 has shifts of staff members cost. attempting consolidate 2 tables calculate cost of each location stay.

here method:

;;with stay (     select top 650000         stayid,         location,         start,         end     stg_stay     loction not null  -- locations don't have matching shift location     order location, adtm ), shift (     select top 36000000         location,         shiftminute,         minutecost,         staffcount     stg_shifts     order location, shiftminute )  select      [stayid],     sum(minutecost) cost,     avg(staffcount) staffcount newtable stay s cross apply (select minutecost, staffcount                 shift r                  r.location = s.location                  , r.shiftminute between s.start , s.end              ) shifts group [stayid] 

this i'm at.

i've split shifts table minute minute level since there no clear alignment of shifts stays.

stg_stay contains more columns needed operation. stg_shift shown.

indexes used on stg_shifts:

create nonclustered index ix_shifts_loc_min on dbo.stg_shifts (location, shiftminute) include (minutecost, staffcount);  

on stg_stay

create index ix_stay_stayid on dbo.stg_stay (stayid); create clustered index ix_stay_start_end_loc on dbo.stg_stay (location,start,end);  

due fact shifts has ~36 million records , stays has ~650k, can make perform better?

  1. don't break down rows minutes.
  2. staging table may if can create fast relationship between them. i.e. overlapped interval

select *  locations l outer apply -- assume staff won't appear in different location in same period of time, of course. (   select      convert(decimal(14,2), sum(costperminute * overlappedminutes)) actualcost,     count(distinct staffid) staffcount,     sum(overlappedminutes) staffminutes     (     select        *,       -- calculate overlapped time in minutes       datediff(minute,         case when starttime > l.starttime starttime else l.starttime end, -- greatest start time         case when endtime > l.endtime l.endtime else endtime end -- least end time       ) overlappedminutes,       cost / datediff(minute, starttime, endtime) costperminute     staff      locationid = l.locationid        , starttime <= l.endtime , l.starttime <= endtime -- match overlapped time   ) data ) staffinloc 

sql fiddle


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -