Things to consider for Fractional Revenue Attribution

Revenue attribution is the hottest topic these days.  Proliferation of online media, requires reshuffling marketing spend across many more spend categories.  Traditional funnel-engineering type work is good, but static, and doesn’t address a few key issues.

1) The transient nature of marketing spend effectiveness that comes and goes with changing keywords, banners offers

2) It does not address the problem in a customer-centric manner (in fact orders are placed by customers who clicked on a keyword, or received a catalog)

The new marketing spend effectiveness paradigm involves understanding causality of relationship between marketing and sales at a transactional level using statistical methods to fractionally attribute.  There are five elements at play;

  1. Order of events:  what sequencing (0rder) of actions lead to sales transactions
  2. Combined effects: what is the joint effects of marketing touches
  3. Frequency: how many touches are required to convert a prospect to a buyer
  4. Time decay:  How the effects of marketing->sales decay with time passed
  5. Effectiveness: what is the relative efficacy of each vehicle is different (e.g., banner view is not the same effectiveness as a 52 pg. catalog)

How this problem can be expressed in mathematical terms and the solution is quite sophisticated and i can not get into it since this is our core IP at Agilone.

Once an attribution could be made, the next issue is how to measure the effects of overspending, which i will get into in the next post.  The inherent problem in fractional attribution is how to make sure that by increasing marketing spend on one vehicle will most likely (and not by causality) reduce the effectiveness of other existing spend elements.

Big Data question: what to save for how long?

As tools in the big data world emerges and mature, question is how much of the data to save in high versus low resolution.  The answer depends on the uses of this data.  Recently, i’ve had lunch with someone from Yahoo, where they were doing modeling on full-resolution data and claimed that you need big-data tools (hadoop, mahout) to build predictive models. 

The problem with predictive algorithms requiring more data only arises if the number of independent variables that are predictive is large.  Higher number of variables require larger datasets to train classification models (see Richard Bellman’s curse of dimensionality, the godfather of dynamic programming).  In any case, the big data gives us a 1-2 orders of magnitude higher processing power, which only allows for a few more variables, as the volume of data required increases exponentially with new variables. 

Perhaps the more important question to ask is why we need and how much data we need to do what we need to do.  In our focus, we provide marketing analytics to our clients, so our focus is marketing.  In the case of mining web analytics logs, there are apparently four uses

  1. Revenue Attribution
  2. Modeling
  3. Triggering marketing actions
  4. Building temporal statistics on customer actions

These four topics require data to be saved for various

  1. Length of time
  2. Resolution

Here is a simple depiction of the uses by resolution and data retention.

Determining how much to keep after then initial 90 days or so depends on the modeling uses.  If the models being built have a natural 3-4% response rate, you need data that is approximately double that, so you are properly representing negative outcome events (actually oversampling success events).  This level of data retention is enough for doing most propensity and event modeling exercises, since the data is actually pretty large.