Fixed a typo in the uplift tutorial and clarified AUUC

PiperOrigin-RevId: 545212831
This commit is contained in:
Richard Stotz 2023-07-03 07:00:15 -07:00 committed by Copybara-Service
parent 6ec43809e6
commit 407772c2fc
1 changed files with 6 additions and 6 deletions

View File

@ -67,7 +67,7 @@
"\n",
"In this colab, you will:\n",
"\n",
"- Learn what an uplift modeling is is.\n",
"- Learn what an uplift modeling is.\n",
"- Train a Uplift Random Forest model on the **Hillstrom Email Marketing** dataset.\n",
"- Evaluate the quality of this model.\n"
]
@ -177,7 +177,7 @@
"source": [
"## What is uplift modeling?\n",
"\n",
"[Uplift modeling](https://en.wikipedia.org/wiki/Uplift_modeling) is a statistical modeling technique to predict the **incremental impact of an action** on a subject. The action is often referred to as a **treatment** that may or may not be applied.\n",
"[Uplift modeling](https://en.wikipedia.org/wiki/Uplift_modelling) is a statistical modeling technique to predict the **incremental impact of an action** on a subject. The action is often referred to as a **treatment** that may or may not be applied.\n",
"\n",
"Uplift modeling is often used in targeted marketing campaigns to predict the increase in the likelihood of a person making a purchase (or any other desired action) based on the marketing exposition they receive.\n",
"\n",
@ -382,11 +382,11 @@
"\n",
"Suppose you have a labeled dataset with $|T|$ examples with treatment and $|C|$ examples without treatment, called *control* examples. For each example, the uplift model $f$ produces the conditional probability that a treatment on the example will yield a positive outcome.\n",
"\n",
"Suppose a decision-maker needs to decide which clients to send a voucher using an uplift model $f$. The model produces a (conditional) probability that the voucher will result in a conversion. The decision-maker might therefore just pick the number $k$ of vouchers to send and send those $k$ vouchers to the clients with the highest probability.\n",
"Suppose a decision-maker needs to decide which clients to send an email using an uplift model $f$. The model produces a (conditional) probability that the email will result in a conversion. The decision-maker might therefore just pick the number $k$ of emails to send and send those $k$ emails to the clients with the highest probability.\n",
"\n",
"Using a labeled test dataset, it is possible to study the impact of $k$ on the success of the campaign. First, we are interested in the ratio $\\frac{|C \\cap T|}{|T|}$ of clients with voucher that converted versus total number of clients with voucher. Here $C$ is the set of clients that converted and $T$ is the number of clients that coverted. We plot this ratio against $k$.\n",
"Using a labeled test dataset, it is possible to study the impact of $k$ on the success of the campaign. First, we are interested in the ratio $\\frac{|C \\cap T|}{|T|}$ of clients that received an email that converted versus total number of clients that received an email. Here $C$ is the set of clients that received an email and converted and $T$ is the total number of clients that received an email. We plot this ratio against $k$.\n",
"\n",
"Ideally, we like to have this curve increase steeply. This would mean that the model prioritizes sending vouchers to those clients that will generate a conversion when receiving a voucher."
"Ideally, we like to have this curve increase steeply. This would mean that the model prioritizes sending email to those clients that will generate a conversion when receiving an email."
]
},
{
@ -429,7 +429,7 @@
"id": "97IFpq5epHsx"
},
"source": [
"Similarly, we can also compute and plot the conversion ratio of those not receiving a voucher, called the *control group*. Ideally, this curve is initially flat: This would mean that the model does not prioritize sending vouchers to clients that will generate a conversion despite **not** receiving a voucher"
"Similarly, we can also compute and plot the conversion ratio of those not receiving an email, called the *control group*. Ideally, this curve is initially flat: This would mean that the model does not prioritize sending emails to clients that will generate a conversion despite **not** receiving a email"
]
},
{