Skip to player
Skip to main content
Skip to footer
Search
Connect
Watch fullscreen
Like
Comments
Bookmark
Share
Add to Playlist
Report
8. Data Sampling Video
eTrainerbox
Follow
5/9/2025
Category
š
Learning
Transcript
Display full video transcript
00:00
Let's dive into the topic of data sampling.
00:03
To optimize performance, Tableau Prep samples large datasets and returns a subset of records.
00:09
The subset of rows returned from your data sources may or may not be representative of your data,
00:15
depending on the settings that you've chosen.
00:18
Your sample may be optimized for speed, but not necessarily a representative sample of your data.
00:24
So it's important that you make sure your settings are to your liking,
00:28
and if they aren't returning the right results, you can change them right within the input step.
00:33
Let's take a quick look at the different options for data sampling.
00:37
Now you can see the data sampling options inside your data sample tab in your input step.
00:43
Once you click on the data sample tab, you'll have two main sections to be concerned about.
00:47
You can select the amount of data to include in the flow, and you can choose the sampling method.
00:53
Now the default sample amount is chosen by PrepBuilder.
00:57
PrepBuilder will determine the number of rows to return for your dataset.
01:01
This is an automatic algorithm that Tableau Prep uses to optimize the data sample.
01:07
Alternatively, you can also choose to use all data to run through your flow.
01:12
This will retrieve all rows regardless of the size of your dataset.
01:16
And as a result, this can cause performance issues.
01:20
Note that even if you pick use all data, there will be some limitations.
01:24
Data will still limit to 1 million rows or less for the aggregate and union steps,
01:30
and 3 million rows or less for the join and pivot steps.
01:35
Now this doesn't mean that the data will be limited when you actually go and run your flow.
01:40
It just means that when you're building the workflow, it's going to limit the amount of data that it uses to profile your datasets.
01:48
That means when you're filtering, aliasing, and doing other cleaning steps, you may be missing data depending on which option you're choosing.
01:56
You can also choose a fixed number of rows to pull.
02:00
It's recommended that this is less than a million records for performance reasons.
02:05
Now when we talk about sampling method, QuickSelect is the default.
02:10
Using QuickSelect, a sample is returned as quickly as possible.
02:14
It will use n number of rows or cached data that's available from a prior query to develop your sample.
02:22
Now this option is less accurate but often quicker than the random sample.
02:27
The random sample will return the number of rows requested,
02:31
but it will look at all records and return a representative sample of data.
02:36
Now using a random sample may impact performance, but this will only be before your standard cache.
02:43
Once you run your flow more than once, some of the data will cache in your memory,
02:47
and you'll be able to run the flow again with faster processing time.
02:52
Now that we've looked at the data sampling options,
02:54
let's jump into Tableau Prep and choose data sampling options for our own datasets.
02:59
Okay, so we're back in Tableau Prep, and we're going to choose data sampling for all of our inputs.
03:05
For our school level inputs, we're going to choose random samples,
03:09
and for our district level inputs, we're going to use all data.
03:13
Now technically, our data is not huge, so we'll really be pulling all of our dataset through our flow anyway,
03:21
but this will get you some good practice at setting your data sampling.
03:25
Let's go ahead and click Plan of High School Grads.
03:28
We'll pull this up so we can see it a little bit better.
03:31
Choose Data Sample, and we're going to choose a random sample for our sampling method.
03:36
Again, this will be a more thorough sample than the Quick Select,
03:41
but that won't matter too much for performance in our place because our datasets are not very large.
03:47
You will want to consider this for datasets that are large, however, as it could impact performance.
03:53
Let's go to our other school inputs and change to random sample.
03:57
So we'll click on Educator Evaluation Performance.
04:02
We'll go to Data Sample, and we'll choose Random Sample.
04:06
Then we'll go into our SAT Performance.
04:09
We'll go to our Data Sample tab, choose Random Sample here.
04:13
We'll go to Teacher Data, click Data Sample, and click Random Sample here.
04:19
Alright, so that's all of our school inputs, these top four,
04:22
and then the bottom four are going to be our inputs at district level.
04:26
So we'll click on Per Pupil Expenditures.
04:30
We'll click Data Sample, and instead of Random Sample here,
04:35
we're going to want to modify our select amount of data to include in the flow.
04:40
Now for these datasets, we're going to want to choose Use All Data.
04:44
Now all of the data will be used in our flow.
04:47
You'll notice how sampling method is grayed out.
04:50
That's because you're not sampling if you're pulling in all of the data.
04:53
Sampling is a subset of your total dataset, but if we're using our total dataset,
04:58
there's no need for sampling.
05:00
Let's go ahead and click on our Advanced Course Data Source,
05:04
choose Data Sample, and choose Use All Data.
05:08
And we'll go to our Teacher Salaries dataset, click Data Sample, and choose Use All Data.
05:15
And up next, we'll talk about how to refresh your data input connections.
05:23
So many artists find that we are going to change the results made Width looks like certain
05:24
specific things to enable Glucon to guide us, to have an authentic value to a Ų¹ŁŁality.
05:28
Yes.
05:29
And of this, I'll follow you later.
05:31
Now for theƓture, your initial Act of Ed introduce new lodge demandAmt.
05:33
ć ć¾ć Raymond Campagascarer.
05:34
Now let's spend the time here, and my mastermind element here for another reason.
05:36
Yep, which we're excited, are over here, you know,
05:37
as questions as questions and here, go ahead or to65,ā¦
05:38
So here, you're going to love this tool.
05:40
That way, I'm a different cornerstone, eyes.
05:41
We'll go ahead and be back.
05:42
So weŠ¾Š±ŃŠ°Š¶ out our discretion, my wurden here and I liebe it.
Recommended
5:24
|
Up next
9. SOLUTION- Examining - Filtering Video
eTrainerbox
6/7/2025
1:02
8. ASSIGNMENT- Examining - Filtering Video
eTrainerbox
6/7/2025
3:37
Highlighting
eTrainerbox
6/7/2025
5:26
9. Value Operations - Split Values Video
eTrainerbox
6/1/2025
1:19
Test 2
eTrainerbox
5/20/2025
4:48
3. Value Distribution Video
eTrainerbox
5/9/2025
0:52
11. ASSIGNMENT- Connecting to Data Video
eTrainerbox
5/9/2025
4:15
10. SOLUTION- Connecting to Data Video
eTrainerbox
5/9/2025
5:03
9. Refreshing Data Video
eTrainerbox
5/9/2025
4:03
7. Text Configuration Video
eTrainerbox
5/9/2025
6:46
6. Input Cleaning Video
eTrainerbox
5/9/2025
13:29
Data Connection Examples Video
eTrainerbox
5/9/2025
8:24
4. Wildcard Unions Video
eTrainerbox
5/9/2025
1:35
Understanding Data Connection Types in Tableau Prep
eTrainerbox
5/9/2025
1:25
1 - Installing R
eTrainerbox
12/6/2024
11:47
2 - Data science with R A case study
eTrainerbox
12/6/2024
6:47
1 - R in context
eTrainerbox
12/6/2024
1:27
1 - Make your data make sense
eTrainerbox
11/29/2024
1:05
Intro to Tableau Prep 1. Connecting to Data Video
eTrainerbox
10/27/2024
6:27
test 2
eTrainerbox
5/19/2025
0:51
Former Aide Claims She Was Asked to Make a āHit Listā For Trump
Veuer
9/27/2023
1:08
Muskās X Is āthe Platform With the Largest Ratio of Misinformation or Disinformationā Amongst All Social Media Platforms
Veuer
9/27/2023
4:50
59 companies that are changing the world: From Tesla to Chobani
Fortune
9/27/2023
0:46
3 Things to Know About Coco Gauff's Parents
People
9/23/2023
0:35
8 Things to Do in the Morning to Improve Productivity
Martha Stewart Living
9/22/2023