- 2 days ago
Category
🦄
CreativityTranscript
00:00:00In this first video I want to talk to you about bar charts. I'm here in my Jupyter Notebook inside
00:00:06of Microsoft Edge, my web browser. I have navigated to the folder that contains the files that I want
00:00:13to work with and in my case it's in my OneDrive file. I've created a folder called YouTube and
00:00:20inside of that data visualization with PlotKey for Python. I have a style sheet, a cascading style
00:00:27sheet that will be uploaded to GitHub so that you can use that as well. I have a Python cheat sheet,
00:00:34a PDF, which I'll also download. You can get that from the PlotKey website. There's a style template
00:00:39which I just always save just to open up and it's got the bare bones of which I want to work with.
00:00:44But here's the file we're looking for, the simple bar chart. Let's open that up.
00:00:48And there you go. Everything is ready. You'll see the first cell that I have here is just the
00:00:58import of the cascading style sheet. It's ipython.core.display and from that I import HTML.
00:01:06I reference the file, this cascading style sheet file and that lives in the same folder as this
00:01:13notebook. And I just use the HTML function there, open the CSS file in read mode and we read it.
00:01:20And if I execute that, you'd see that I have this new style. I have my H1 color of the text here being
00:01:28blue, the H2 being this nice orange, etc. Just styles the notebook for me. So let's have a look at the
00:01:37simple bar chart. Now, first of all, I want to set up my PlotKey library and what we're going to use
00:01:44here is called notebook mode. So I don't want to have these files uploaded to the PlotKey website
00:01:52to live in the cloud. I just want to use them locally. So I'm going to use the notebook mode.
00:01:57So from PlotKey.offline, I'm going to import iPlot and init notebook mode, as you can see here.
00:02:04And I'm just going to call this function init notebook mode. So if we run this,
00:02:08that'll just import this iPlot and initializing this notebook mode and it'll initialize this
00:02:15notebook mode. So what do we actually want to import from PlotKey? Well, we're going to start
00:02:19off with these high level charts. So bar charts, charts, scatter charts, etc. These are high level
00:02:26and I can just use them as is. So I can say import PlotKey.graph objects. So graph underscore objs
00:02:34as go, as go. Let's do that. So let's start off with a very bare bones chart.
00:02:41The first thing we want to set up is just what is called the trace. We call it trace here as just a
00:02:50computer variable. But that just creates on top of the figure that we are creating this blank figure.
00:02:56We're going to put on these elements and it's just the norm or to call them trace.
00:03:03So go. Remember, that's my graph objects dot bar. So immediately the arguments here are
00:03:10specifically for a bar chart. That's why we call this high level chart. I don't have to
00:03:14to design every single little element. Most of it's already designed inside of this bar object
00:03:20and I just have to pass some arguments to it. And I'm going to pass x arguments for the x-axis
00:03:26and y arguments for the y-axis. So what is a bar chart for? A bar chart is for categorical variables.
00:03:34So anything that is not necessarily a number, although numbers can also be categorical variables, but
00:03:42they are not numerical and as such as the difference between them are not set and standard. So I could
00:03:50call January, February, March 1, 2 and 3 for the first, second and third months of the year,
00:03:54but that does not make them numerical variables. They are still in that instance categorical variables.
00:04:00So bar charts are for categorical variables. So my x-axis here, I'm going to have the categories
00:04:05January, February and March. And on my y-axis, I'm going to have numerical values. So how
00:04:12many things, whatever the situation might be, in my instance here, I'm going to use, you'll see later
00:04:18on, I'm going to call them sales. So in January, there were 10 unit sales in February 11 and in
00:04:23March 14. So those are numerical values on the y-axis, categorical variables on the x-axis.
00:04:30Now I'm going to introduce a second computer variable. I'm going to call it data and I'm going
00:04:34to pass into that a list. So it goes inside of square brackets. I'm going to pass a list of all the
00:04:40traces. These are these elements that go on top of my blank figure. And in this case,
00:04:45I only have one trace, but I've got to put that inside of these square brackets. I'm referencing
00:04:49trace here as a list element, an element inside of this list. Third computer variable that I use
00:04:56here is this fig. Now these are the standard. They're used in Plotly all the time. So might as well stick
00:05:02with those. And I'm going to call another graph object called a figure. That's a blank canvas,
00:05:06a blank figure. And figure with a capital F, as you can see there, it takes a couple of arguments.
00:05:13In this instance, we're only going to use one argument and that's the data argument,
00:05:16as you can see here. And we're going to set that equal to data, which is this data,
00:05:20which contains a list of trace and the trace is a bar. So it just builds one thing on top of the other.
00:05:27So inside of this data that I'm referencing, there's a list and the first element in the list
00:05:32is trace. And in trace is this element called a bar chart. Finally, I'm going to call IPlot,
00:05:38remember, which we imported up here. IPlot, that's for plotting directly in the notebook,
00:05:45not using online mode, not going to the cloud. And I'm going to plot this figure, FIG,
00:05:51fig that I've created. Let's hold down shift and hit enter or return. And there you go,
00:05:55your first beautiful, your first very beautiful plot chart. So again, we can see the x-axis here,
00:06:02January, February, March, and we can see the y-axis, the values that we put in 10, 11 and 14.
00:06:09Now I'm going to use my mouse on the left hand side. Look what happens when I hover over the text,
00:06:15over this element, the elements on this blank figure. So the go dot figure was this blank
00:06:22background. And I put these elements on top of it, which was on a high level, was a bar chart.
00:06:27I hover over it and you see that January at the bottom, it gets highlighted and the number 10 at
00:06:31the top gets highlighted. That's very nice. I really like that because if I give a presentation
00:06:37and I don't use PowerPoint for presentations, I'm not a PowerPoint user or fan. Well, I mean,
00:06:42you've got to use it sometime, but I try to stick with my Jupyter notebooks. And it's very nice to
00:06:47have this interactive plots as you do your presentations. And look what happened at the
00:06:53top here as well. I get quite a few little buttons here to press and that I really find very useful
00:07:00because I can download this plot as a PNG file directly on my hard drive. And I can put that
00:07:05inside of a hard copy. If I were writing a report and it's got to go inside of a Word document, etc.
00:07:10I can just do that. I can save this for editing in Chart Studio. That's online in your Plotly online
00:07:16account. You can zoom, you can pan, you can select a box and we're going to see all of these because
00:07:22they become very useful. I can zoom in, I can zoom out and out even further and out even further. I can
00:07:29just go back home and that just resets everything for me. I can show the closest data on hover. So if I
00:07:36do this, you're only going to see January being 10, February 11, March 14. And if I click that,
00:07:44you see the highlight happens at the bottom on the x-axis instead of on top at the at the top there.
00:07:50And then I can just open Plotly the website itself. So very nice, especially this download as PNG file.
00:07:59So that is fantastic. So let's add a little title because you can see at the top my chart has no title.
00:08:07So I'm going to introduce a new variable called layout and this layout is going to be a Python
00:08:12dictionary. So it goes inside of curly braces and it has the key value pair. So the key colon the
00:08:18value. The key is title and these go inside of quotation marks. So the title that's the key
00:08:25and the value for the dictionary for this key is sales for first quarter. Now I'm going to redo my
00:08:32fig from up here. It's still going to have the trace. It's still going to have the data.
00:08:36All I'm changing is this fig. So go dot figure, my data still is the data from our first one,
00:08:42but the layout is now this dictionary called layout. So this data and layout, these are the argument
00:08:50names and argument values. Well, we gave the computer variables the exactly the same name.
00:08:55Don't worry, there won't be any confusion for Python. It'll understand plotly. This plotly
00:09:00figure will understand what's going on. So data just refers to the data that I created with a trace
00:09:07list and lay out this layout refers to this layout. So let's plot this shift, enter shift return. And
00:09:14now at the top, I have sales for first quarter. So a beautiful title now on the top of my figure.
00:09:22What about some axis labels? Now with these categories that we put in January, February,
00:09:27March, it's easy to see that these are months, but I might want to specify that. And more importantly,
00:09:31I want to specify what's on this y axis, because what is one, two, three, four, you know, you need
00:09:37to, you need to know what these are. So I'm going to just change my layout computer variable. Still
00:09:42going to be a Python dictionary title is still going to be first quarter elements. Now the x axis,
00:09:49that's the key. It holds a value, but the value is another dictionary. And that dictionary is again,
00:09:57a key and value pair. So the title is going to be months. And the same for y axis, the key is y axis.
00:10:05The value is another dictionary. And that dictionary contains a key value pair, the key being title
00:10:10and the value being units. Now I'm going to do something different. Remember up here, we just said
00:10:17data equals data and layout equals layout. But we're going to do something a bit different here. So instead
00:10:22of creating a blank figure, I'm just going to pass everything as a Python live as a Python dictionary.
00:10:29So here we have I plot. And inside of that, there's this dictionary. And the dictionary is just a key,
00:10:36key value pair, comma, another key value pair. So data is just data, still the data from upstairs.
00:10:42And the layout is just this new layout. So let's shift and enter shift and return. And now we can see
00:10:49that we have months here as our x axis title and units as our y axis title. So that's how many units
00:10:56were sold in January, February and March. So see the difference here between the two. So I did not
00:11:01invoke the figure, the go dot figure object here. I just passed everything to iPlot as a dictionary.
00:11:10And sometimes it gets confusing because there's many ways in Plotly to do something. You'll see more
00:11:14ways as we carry on. So when it comes to the layout and such, I try to stick with these,
00:11:20just to stick with the Python dictionaries. When you stick with the dictionaries, it becomes slightly
00:11:24less confusing. And you can find something that's just, you know, stuck in your mind, and it works
00:11:30for you and just carry on with that. So I like just to use for layout and then plotting just the
00:11:36this dictionary way of doing things, although you don't have to stick with it. As we could see here,
00:11:40here we created this blank figure and we passed these arguments to it. But I can also just use
00:11:45it as a pass to iPlot, just a dictionary. And, and as I say, there are many things I can do with
00:11:52this x axis, this x axis being a key and the value pair. There are many more things than just the
00:11:57title. And we'll see some of them just in this, in this tutorial. So it just, it lessens the confusion,
00:12:03confusion for me, but look at the website, the Plotly website, and you might find other ways. And we'll,
00:12:08we'll look at some of these in future videos, but stick, I stick with this, but look, look what,
00:12:13what works for you. So let's just rotate these labels at the bottom, you know, January, February,
00:12:19March, they're quite, quite small here. So they, they fit in with this big chart that we have here.
00:12:25But sometimes you have long categorical variable names here, data point values here. And,
00:12:32and we've all seen plots where these names actually, you know, start printing on top of each other.
00:12:36And the easiest way to get rid of that is obviously to shorten the data point values,
00:12:40these the your sample space values, but sometimes it's not possible. And you just want to rotate
00:12:46them. And look at this. And that's the reason why I like these dictionaries, because look at this,
00:12:51I have layout again, which is a dictionary key value pair, here's another key value pair. But now,
00:12:56I have two elements here in this value side of the x axis being the key and the value being another
00:13:08dictionary. And inside of that dictionary, there are two key value pairs. Title is months,
00:13:13and the tick angle is minus 20. I don't put that inside of quotation marks because it's not,
00:13:20this is just a value numerical value. So I'm going to say minus 20,
00:13:23and that rotates at negative 20 degrees. And my y axis is still all the same.
00:13:28So let's run that. And if we scroll down, we see we have this negative 20 degree
00:13:33from from the horizontal, it's tilted down negative 20 degrees. So if you have these long
00:13:40words and sentences there, you know, they can all fit in because of that angle.
00:13:45Now, let's color these bars. The blue is fantastic. I like this color, but you are free to do what you
00:13:50want. So I'm going to just change my trace here. It's still a bar, high level bar chart. I still
00:13:56have the x axis values being categorical January, February, March. I still have my y values being
00:14:01numerical 10, 11, 14. But now I'm going to change the marker. I'm going to introduce this marker.
00:14:08And again, it's a dictionary, but this is another way just to do a dictionary. So I'm going to call this
00:14:13dict here. And the first thing I want to pass is color. And the color is going to be a list.
00:14:21And this list refers to the elements as you pass them on the x axis. So I have one,
00:14:26two, three elements, January, February, March. I've got to pass three colors here.
00:14:30And I'm going to use this format. Note that I have these quotation marks. They can be single
00:14:34or double. They're single in this instance. But it's RGBA. RGBA means I can,
00:14:41I can also pass a fourth value here, which would be transparency, zero being fully transparent,
00:14:46one being completely opaque and anything in between. So it's red, green and blue channel and
00:14:50then opacity. So RGBA and then parentheses 255. That's maximum on the red, zero for blue,
00:14:58green and zero for blue and full, full opaqueness. So value of one day. The second one is 204, 204,
00:15:06204. So this can be this light gray, and it's going to be totally opaque. And again,
00:15:11totally opaque. So January is going to be this red color, and March and April is going to be this
00:15:14light gray color. And the data is still the trace. Now, I can't just use the ones I used up before
00:15:21because data referred to to a different trace. And I've changed the trace here. So I've just got to
00:15:26do data equals trace again as a list element. And then the layout exactly the same. And I'm passing
00:15:33this dictionary to I plot. Nothing new there. And now you can see these beautiful colors because
00:15:39I can now have January being this red and the other colors being this light gray. Fantastic.
00:15:46Now, why is this one red and these ones gray? Well, I might want to indicate to my audience why this was
00:15:50done. So I can actually change this hover text. Remember, if I hit this one show closest data on hover,
00:15:57it's going to do both the x-axis and y-axis in one little tooltip there, January, or hover text
00:16:03at January 10. And if I do that, it's just going to highlight them separately. Now, hover text means I
00:16:11can individualize every element that I hover over to have its own text. So we've had marker, we've seen
00:16:19that. So introducing a new argument here to the bar to bar here, text. And I've got to do it
00:16:27individually. So the red was going to be below target, above target and above target. Everything
00:16:32else is the same. Let's run that. And now if I hover, I see this extra text appear in this hover. So that
00:16:38was, I put a 10 and that's below target. And that one was above target. And that one is above target. So you
00:16:45can see, you can build in this beautiful narrative here because you can put a lot of information in
00:16:51this text. If you really want to, to draw attention to what's happening here to inform your audience.
00:16:56So that's fantastic. So let's move things up just with a group bar chart. And now I want to have two
00:17:03sets of elements on here. And so I've changed my computer variable to trace zero and trace one. And
00:17:10with a bar chart, be careful now because we want the same sort of space for both. I've got January,
00:17:16February, March and January, February, March. And I've got my Y values 10, 11, 14 as before. And the
00:17:22second Y is 12, 13, 17. But now I'm going to add a name because I need to tell Plotly that these two
00:17:28things are separate. So I'm going to say name equals last year. And this name equals this year. So you can
00:17:33well imagine that I'm just going to take this year's data and compare it to last year's data. That might be
00:17:37interesting. And now look how data has changed. I'm now passing two elements to this list. Still
00:17:42got to be a list still inside of square brackets. I've got trace zero and trace one. The layout is
00:17:48going to be exactly the same, but I'm introducing this new key value pair of bar mode. And the bar,
00:17:54the bar mode that I want you to use here is group. So what it's going to do is going to group January
00:17:59and January, February and February, March and March. Hence, they've got to be the same. And then I'm still
00:18:04going to use this dictionary to pass to iPlot. And now they've been grouped and we see last year
00:18:10in blue and this year in orange, that will be the default colors. And you see that they are indeed
00:18:15grouped. So I've got this year and last year, this year and last year, this year and last year,
00:18:19a beautiful way just to do that. Now we need to group them like this. We can stack them as well.
00:18:24And here I have exactly the same thing, but instead of bar mode being group, I've made bar mode being
00:18:29stacked as the key value pair here. And if we run that, we see that now we now have this stacked
00:18:36version of it. But if I hover, you know, you can still see that this year is 12 and last year was
00:18:4210. You don't have to go to the x-axis in your mind and your audience have to mentally try and see
00:18:46where's that top and subtract from the 10 to see that it actually gets to 12. No, no, Plotly makes
00:18:52it brilliantly easy. With your hover, you can give a beautiful presentation just to hover over these and
00:18:59explain. So that's our first tutorial on Plotly, my all time favorite library for plotting inside of
00:19:05Python, but inside of other languages as well. And a nice introduction for you. Start playing with us
00:19:12and we'll carry on with this playlist on YouTube and I'll introduce you to a lot more plotting using
00:19:18Plotly for Python. Before you do go there, please subscribe to this channel for all the information
00:19:24that I'm trying to get you. Hit that notification button, the little bell there, to let you know
00:19:30where new videos are uploaded. Thanks a lot.
00:19:44My name is Jean Klopper and I'm a Surgeon at Krutuskir Hospital and Senior Lecturer at the
00:19:48University of Cape Town. Now I run the Klopper Research Group where I really support our postgraduate
00:19:53students in their research. My main field of research is machine learning in the clinical
00:19:58setting. In the support of research, I have a lot of online courses here on YouTube, but also on Udemy
00:20:04and the massive open online course platform, Coursera, where I really teach biostatistics.
00:20:09It's what it's all about. Now, when we use Python locally, one of my favorite plotting libraries is
00:20:15Plotly. You can do Plotly online, but you can also use it in Python and many other computer
00:20:20languages as well. So this whole series is going to be about Plotly and I'm going to start off with
00:20:24by just showing you the basics. And the easiest way to get going is through a bar chart. So
00:20:29subscribe to this channel, hit that notification button. Every time a new lecture comes out,
00:20:33a new tutorial comes out, you'll be notified about it. So please do that. Let's watch this first video
00:20:38on Bar Charts. So in today's tutorial, we're going to talk about the humble histogram. So in my first
00:20:56cell here in my Jupyter Notebook, I'm just importing my cascading style sheet as per usual, just to get
00:21:02away from this bland looking black to just something a bit more interesting. There we go. If I run the
00:21:07cell, we see these nice little colors and different fonts for the markdown that is used. So let's set up
00:21:16the library. I'm going to import from plotly.offline my iPlot so that I can plot directly in this
00:21:23Jupyter Notebook and to the init notebook mode just to initialize the notebook. We did that with the
00:21:30previous tutorial on the bar chart. We're going to do exactly the same here. And again, we're going
00:21:34to make use of one of the high level charts. So we're going to do plotly.draftObjects as
00:21:39Geo will do that import as well. Now for this case, I just want to generate some random values.
00:21:46So I'm going to import the random module, the Python random module. So that's very easy, import random.
00:21:53And I also want to import the numerical Python library, NumPy. And I'm going to use this
00:21:58namespace abbreviation. So I'm going to say import NumPy as NP. Now I'm going to just seed this
00:22:04pseudo random number generator so that we get the same results every time we generate these
00:22:09random values. And I'm just going to seed it with a value 1, 2, 3, 4. And there we go. So let's create
00:22:15a few random values. My first variable here is called age. And I'm going to just use a uniform
00:22:23distribution. So I'm going to say NP for NumPy.random. So I'm doing the random module in this
00:22:29NumPy library. And from there, I want the uniform distribution. My lowest value must be 21. My highest
00:22:35value is 75. And the last argument is size, meaning I want 100 values from that distribution.
00:22:43The next one, I'm just going to use the computer variable salary. Again, that's NumPy.random. And
00:22:48this time from a normal distribution. I want the mean, which has the argument LOC to be 3000. I want
00:22:57the standard deviation, which is the scale argument here to be 1000. And again, the size,
00:23:04the number of samples I want drawn being 100. Now I'm just going to use a choice between just purely for
00:23:13the sake of simplicity, use a gender, a binary gender here, just for purely for the sake of
00:23:20simplicity, we use, and I'm going to create the, the computer variable binary gender. And I'm going
00:23:25to use from the random module, I'm going to use random.choices. And what you can do with the,
00:23:31this choices function is just to pass a list of all these various text, the string elements that you
00:23:39want to choose from. So I've got female, then male, and I'm going to say draw at random 100 of those.
00:23:45And it is, it is drawing that at random with replacement. So the first one might be female,
00:23:50it puts female back in the box, draws another one blindly out of the box, it might be female again,
00:23:55throw it back. So 100 of those choices. Now I want to place these inside of a pandas data frame. It's a
00:24:03beautiful thing to use. I urge you to have a look at pandas. I'm going to import pandas and use the
00:24:08abbreviation pd. And I'm doing that because I want to create a data frame. I'm going to call my data
00:24:13frame df. And I'm going to call the data frame function here, pd.dataframe. And I'm going to
00:24:18create this data frame out of using a dictionary. So I'm going to have this age column. And underneath
00:24:26that age column, think of a spreadsheet. So the header, in the first row, I have my header values,
00:24:31my variables, and I'm going to call this first one age. And down that column, all the rows are going
00:24:36to be populated by these 100 age values. The salary is going to be the next column header in my
00:24:42spreadsheet, basically, which was what a data frame can be seen as in a simplistic way. Pass all the
00:24:49100 salary values and then gender, this binary gender values of mine. So let's run that. And I've
00:24:54created a data frame. And I'm going to split my data frame in two, one for female and one for male.
00:24:59And the one that we're going to do, the way we're going to do that, create these two computer variables,
00:25:03female and male. The first one says, take this data frame. And then we're going to use this square
00:25:09bracket notation, because it wants to run down row by row. So I'm just indexing this df.gender column.
00:25:16So it's going to look down the gender column and only select those. So this is a boolean,
00:25:21double equals sign, only the females, and then male will only be containing, contain the male. So I
00:25:26have two data frames now, female and male. So let's create our first bare bones histogram,
00:25:31now that we have some data. Again, per usual, just by convention, I'm going to use this trace
00:25:37computer variable. And it's go, go, because this was a graph object, one of the high level graph
00:25:42objects, and it's a histogram. I'm going to pass at the x value, because let's just stop there for a
00:25:47moment, what, you know, just to discuss what a histogram is. Whereas a bar chart looked at
00:25:52categorical variables on the x axis, the histogram is going to look at numerical variables. So age is
00:25:58definitely a numerical variable, a ratio type numerical variable, because there's a true zero.
00:26:04And we have, we're going to split that up into little bins. And if we just pass the edge here,
00:26:11Plotly will decide how large those bins are. So what are bins? Well, as soon as I show you the
00:26:16histogram, I can show you what bins are. Again, the computer variable data and pass the list of all
00:26:22these histogram elements I want to put in my figure. And then instead of using go.figure,
00:26:26I'm just going to use this dictionary notation. So I plot and then data, please. And we see here,
00:26:33look at the bottom, this is new, these are numerical variables, it's not categorical.
00:26:38And what it's done is, it seems to be having plot from 20 to 25, 25 to 30, 30 to 35, 35 to 40.
00:26:47And that's what I mean by these little bins. So all it's going to do, it takes those 100 values,
00:26:51and it, and it decides how many are between 20 and 25. And it'll count them. And it noted that there
00:26:57was eight. And between 25 and 30, there was 30, between 30 and 35, there was five. So between 25
00:27:05and 30, there was eight, etc. And so it goes on. So that is why a histogram is not a bar chart. And
00:27:11what you can see here also, there is really no spaces between these. Just to indicate that we're
00:27:16talking about a continuous, a continuity here, whereas a bar chart by definition has these gaps
00:27:22in just to create this visual impression that we are dealing with categorical variables, that this
00:27:26is not a continuum. And here we do have a continuous random variable at the bottom called age. Now let's
00:27:33just change that to a frequency distribution, or otherwise known as a normalized histogram,
00:27:38because what you can see on the top one here is we count, we count how many were between 20 and 25.
00:27:43But if we normalize that, it gives us a frequency distribution. And the way that we do that is just
00:27:48by, in this histogram element that we create at the top, we pass a new argument hist norm, and we pass
00:27:56to it this equals probability. I'm going to introduce a layout here by a format, by a dictionary format. So
00:28:03I have title and its frequency distribution of the age gen, age variable. And my x axis, I'm going to bring
00:28:09in a title also a dictionary with a key value pair, the value pair being a dictionary itself,
00:28:15consisting of a key and a value. And I plot via a dictionary as well. And now we can see frequency
00:28:23distribution of age. I have age in five year increments here at the bottom. But most notably,
00:28:29you'll know now, you'll see now that this is normalized. So the area under this curve, which is
00:28:34the area of this rectangle, and this rectangle, and this rectangle, and this rectangle is going to
00:28:38sum up to be one. It's mutually, these bins are mutually exclusive, but collectively exhaustive.
00:28:45So they are all here. And the area under this stepwise curve is going to be one. So it's a frequency
00:28:52distribution. So I'll have to if I want to get how many there are, this increment is five times 0.08 to get to
00:29:00to get to the value that we have there that we have. Well, let's have a look. So that was eight.
00:29:08So remember, these are now units of one. So 0.08 times times 100 is going to say it's going to give you
00:29:16that eight. So let's create two in an overlay fashion. So to do that, I'm going to create two
00:29:22traces by convention, calling them trace 0 and trace 1. So go dot histogram, and I have x equals
00:29:29female dot age. And this time, I'm going to give it a name so that we can have this little sidebar
00:29:34that indicates what is what. And the second one that I'm plotting, it's going to layer them from the
00:29:39back to the front. So the back one is going to be female, the one in front is going to be male. So I
00:29:45just want to lower the opacity of the front one so we can still see the bottom, the bottom histogram,
00:29:52the bottom trace through that. So I'm going to set opacity equals 0.8. Everything else being the same,
00:29:58I'm bringing in my little layout there. And now we can see I've got two plots here. It's going to give
00:30:03me this little legend on the side. And I've created a bit of opacity here in the orange, which is the
00:30:09male, just so that this female at the black at the back shines through. And but if we hover over them,
00:30:15beautiful, beautiful, beautiful what plotly does, and you can see that it's still going to give you
00:30:20the values. Let's stack them. So I don't have to do the opacity there. And it is just going to stack
00:30:28them, but it's still going to give me these values to say, in male, there was four, and in the female,
00:30:34there was 12. And look at one thing though, a difference that it made, it is now making these
00:30:40blocks of unit length of 10 years. So I've put in that as 10 years. That didn't, that didn't influence
00:30:47that this was done. I drew this first and saw by, you know, automatically it chose 10. And I put the
00:30:5310 in there after the fact. So let's do that. Let's control the bin size. And the way we do that
00:30:58is by introducing inside of our traces here, we introduce the x bins argument. And I'm going to
00:31:04pass a dictionary for that in this format, not the curly braces format. So start is 20, end at 80. So
00:31:12just the left hand side of the x-axis, the right hand side of the x-axis. And I want a bin size of
00:31:16five units. I'm going to do the same for male. And I'm changing my title here to five, because now
00:31:21I'm in control of it. And if we run this, we'll see that now we, and we are back with this increments
00:31:29of five. We're back with this increments of five. Let's just do that again. This time we're just going
00:31:37to stack them. And just for argument's sake, this time I'm looking at female.celery and male.celery.
00:31:44And we have introduced a bin size of 200 here, starting at 500, ending in 5,500. And because we
00:31:53drew that from a normal distribution, you can see this tending towards this normal distribution.
00:32:00The last thing I want to show you just in this little tutorial is the cumulative histogram.
00:32:05And when you do statistics, you start looking at cumulative distribution functions, etc.
00:32:11This is important, but we're not dealing with that at the moment. This is just a histogram.
00:32:15And the way that I do that inside of this histogram object that I've created, I'm going to use the
00:32:19argument cumulative and pass to that a dictionary with a key value pair of enabled being true. And I
00:32:24use the dict function instead of the curly braces notation here. And if we just execute that, we see this
00:32:31beautiful stepwise. And we can see the larger steps here, meaning we are dealing with a normal
00:32:38distribution. You can see that from this cumulative distribution function. So that is that for the
00:32:43humble histogram. Remember, that is for numerical variables. And we're just going to start counting
00:32:48them or putting them in a frequency distribution by creating these artificial little bins. And we can
00:32:53control the bin size as I've shown you. So that's the histogram. I'll see you in the next lecture,
00:32:58where we continue our look at this wonderful world of Plotly and this wonderful library,
00:33:04plotting library that Plotly is. Please, please, please remember to subscribe and hit the notification
00:33:08button, because that's going to allow you to know when new videos, new tutorials do come out.
00:33:13I'll see you again.
00:33:24Another tutorial on Plotly using Python. So here we are in a Jupyter notebook. I'm going to execute this first
00:33:31cell as per usual, importing my cascading style sheet so that we have a bit of a better looking
00:33:37notebook here. So distribution plots. Let's start off by just importing our Plotly library. As per
00:33:45usual, I'm importing iPlot because I want to plot right inside of my notebook. And I've got to initialize
00:33:52this notebook mode. So let's do that. And then I'm going to import the high level graph objects
00:33:58as go. So import plotly dot graph underscore objs for objects as go. But something new, I'm also going
00:34:06to import the figure factory as ff. Let's do that. Let's create a bit of data for us for working with
00:34:17on this notebook. So I'm going to import the random module. And I'm going to import NumPy using the
00:34:23standard abbreviation, the standard abbreviation NP. And I'm also going to seed the pseudo random
00:34:29number generator. So let's create three computer variables. I'm going to call them age, salary,
00:34:34and binary gender. So for age, we're going to do a uniform distribution. So NP dot random dot uniform.
00:34:43The low must be 21, the high 75, and I want a thousand data point values in that uniform
00:34:49distribution in that domain from 21 through 75. Next, the salary I'm taking from a normal distribution.
00:34:56And you see the three arguments here. LOC means the mean, scale means the standard deviation,
00:35:04and size is just the number of values I want. So from a mean of 3000 with a standard deviation of 1000,
00:35:11I want 1000 data point values from this normal distribution. And lastly, from the random module,
00:35:17we're just going to do random dot choices. And purely to keep things easy, easy here and easy to
00:35:24explain, at least, we're going to stick with this binary distribution binary values sample space for
00:35:33my bio for my gender variable here. So only female and male just to make things easy and 1000 of those,
00:35:38please. I'm going to import pandas because I just want to create a data frame. And here we have the
00:35:43computer variable df. And I'm going to do a pandas dot data frame. And I created by key value pairs
00:35:49for a with a Python dictionary. So I have age as my column header. And then the age variable that we
00:35:56created the salary with salary and gender with a binary gender. And I'm just going to to create two
00:36:02sub data frames there. So if we look down the gender column only include female and down the gender
00:36:08column only include male. And I'm going to call those sub data frames, female and male.
00:36:13So let's run that. And then just let's look at the first five rows. So we just call the head
00:36:19function there, female dot head method there. And we have the first five rows. We see the age there.
00:36:26And then the gender column will only find females there. And we find the salary column there. So you
00:36:30can see with these pandas data frames, they actually look like just flat spreadsheet kind of files.
00:36:37And let's look at the last five rows of the male. And again, just to make sure the gender column will
00:36:43only contain males. So let's create our first bare bones distribution plot. And the way that we're
00:36:50going to do that is to create a computer variable called fig. And that is going to be ff for our figure
00:36:55factory there dot one of the methods there is the create dist plot. And it takes a couple of arguments.
00:37:02The first one is hist data. And that's the data we want in this distribution plot. Now,
00:37:07this distribution plot is going to look like histograms, nothing other than a special kind
00:37:11of histogram. So we've got to give it a list to work with. And what we're going to do is take the
00:37:18whole data frame, go down the salary column, and we say the values in that and then to list the to list
00:37:25function there we call on this on the values of the salary because we just want to create this
00:37:31this list to work with. And you see it is there inside of the square brackets, the group labels.
00:37:39Well, we're only going to do one group here, and I'm just going to call it salary, salary distribution.
00:37:44And as with a histogram, we have to have a bin size. And for each, we have to have its own bin size,
00:37:50but we only have one here. So in our list, we'll only have 200. So that might not make a lot of sense
00:37:55until we actually see what the distribution plot looks like. So let's run that. And there we have
00:38:00our distribution plot. We have the nice histogram down the bottom here. And indeed, the bin size is 200.
00:38:10And we also see this kernel estimate here, kernel density estimate, as it tries to draw this
00:38:18this distribution line here. And there is our group label. We only have, we're only plotting one
00:38:23thing, and that's the salary distribution. So there we go. This is called the rug plot underneath.
00:38:28And each of these little vertical lines is actually one of our salary values. And you can see the
00:38:34distribution. You can also see that we took this from a normal distribution.
00:38:37And you can see the Gaussian type or bell shaped, at least that it attempts to take there.
00:38:44So let's just add a title. And we're going to do that by just using one of the ways to do it,
00:38:52at least. And that is just to call fig layout dot update. So I've created my figure, just as above,
00:38:59and I'm going to update the layout. So just another way of doing it, instead of doing it by a dictionary,
00:39:05as you've seen before, and I'm just going to add a title. And that title is just going to be salary
00:39:10distribution. There we go. We've got a beautiful, beautiful title up above.
00:39:16So that's not too much fun. Let's just create two datasets. So now my hist data, I'm going to
00:39:23make a list of those. So there's just so many ways of doing things in Plotly. And you might find
00:39:29that confusing to start off with, but it also creates a lot of power. And you can find the way that
00:39:33works for you. So here, I'm going to take hist data, create a computer variable, and I'm going
00:39:37to pass a list of values. The first list, I'm going to take the female sub data frame,
00:39:42the salary column, the values in that column, and then create a list of that. So the two lists
00:39:47there. And then same for the male, my group labels are now going to be female salary and male salary.
00:39:53And now I'm going to create my fig and let's create this plot, the hist data. And I just passed the
00:39:59hist data there. So I'm not saying hist data equals hist data, because these are just keywords,
00:40:04the normal standard keywords. So we actually don't have to use them. And then group labels
00:40:08is going to be that list. And then my bin size, I want 200 and 200. So the same bin size for each,
00:40:13which means you can make the bin sizes different for each of those. Let's do that eye plot. And now we
00:40:18can see we have male salary in orange, then female salary in this bluish color. And you see the rug plot
00:40:26for each of those. Beautifully done. Let's change the colors of this. So everything exactly the same,
00:40:34but I'm going to bring in a new argument to my create this plot here and colors. I'm going to do
00:40:39an RGB with, with the opacity here and 0.8 and 0.8 for the opacity. You can see 20, 20, 20. So that's
00:40:47going to be very dark gray and 150, 150, 150. It's like sort of a middle gray color. Let's run that
00:40:53and have a look. And there we go. You can see the light color for male, the darker color for female
00:40:58there. And because we set the opacity, so you can actually see the one shine through the other.
00:41:04Now, instead of this kernel density estimate plot here, we can actually just use the data that comes
00:41:08out of that and create a mean and a standard deviation so that we can create this normal
00:41:12distribution as, instead of this kernel density estimate that we see there. We again have our
00:41:20his data, our group labels. We create this display. We have the his data, the group labels, the bin sizes,
00:41:26but now the curve type is new. It's a new argument and we're just going to set it to normal. And here's
00:41:30just one other way that we can update this layout or create this layout. So I'm going to call fig.layouts
00:41:37instead of the inverted in the quotation marks and the square brackets. I'm just calling dot layout and dot
00:41:43update. And I'm passing, I'm passing this dictionary to it. So key value pair, the title fitted. So just
00:41:51another way. It just makes it so powerful and easy to use. You can use whatever way it fits you. So now
00:41:56we can see this normal curve that it took from the data, just doing the mean and the standard deviation
00:42:03so that we can draw this normal distribution here. And you can see the two values there for male and
00:42:09female. So in case you want to omit some things, there's three things here. That's our curve, our
00:42:15histogram and our rug plot. So we're going to omit a few things. So we're going to say show histogram
00:42:20as being false and show the rug plot also as false. Everything else exactly the same, except that we've
00:42:28added an x-axis inside of our update to our layout here as a key value pair, the key being x-axis, the value
00:42:35being another dictionary and that dictionary having two key value pairs, title being salary and the
00:42:41domain being a thousand to five thousand. So we can even bring that in. And there we go. We just have
00:42:46these two very nice smooth curves there. So you can see with this distribution plot, you can do so much
00:42:52and you can well imagine some data that look beautifully if represented with these distribution plots.
00:42:58I'll see you in the next tutorial.
00:43:05Here we are in the next tutorial. We're going to look at the scatter plot. First of all, my cascading
00:43:15style sheet. Let's run that. And we have this notebook that looks a bit better. Scatter plots.
00:43:20Let's set up our Plotly library. As per usual, we're going to import iPlots so that we can plot right
00:43:24inside of the notebook. And we've got to initialize this notebook mode. So we're going to import it and
00:43:29then we're going to execute that function with these parentheses there. I should say
00:43:45We're also going to import the high level chart objects. So Plotly.graph objects as Go, G-O.
00:43:50There we go. Indeed. We're going to import the numerical Python library. So NumPy. And we're going to use the
00:43:57abbreviation NP. And then we're going to seed this seedo number generator. So let's go
00:44:02NumPy.random.seed. Let's create a bit of data with some data point values for us to work with. We're
00:44:06going to have one, two, three, four, and then five, six, seven, eight computer variables here. Female age,
00:44:12male age. So we're going to stick to, just for the sake of ease of explanation, just to this binary
00:44:17view of gender. So only female and male. So let's create these data points. We're going to have
00:44:34female age and male age. We just, for the sake of ease of explanation here, stick to the binary view
00:44:40of gender. So only female and male. We have the age there. We have salary and we have debt. So female
00:44:45age is going to be from a random integer with a low of 20, high of 65, and 100 values. And then the
00:44:51same for males. The salary, we're just going to play a bit. So we're going to take female age and
00:44:58we're going to add to that random uniform value from negative 10 to 10 and add another thousand.
00:45:03So this is going to be element wise. So for each element in the list of 100 values that we have here
00:45:08in this NumPy array, we're going to add the corresponding value in this uniform value of
00:45:15100 and add another thousand. And we're going to do the same to male salary there. And then we're
00:45:20going to as female debt equals male debt. So that's the way to make two computer variables and make
00:45:24them exactly the same as each other. And that's going to be a random integer from 15 to 30 with
00:45:30100 values. So instead of using the keyword name saying low equals, high equals, because these are
00:45:35normal keywords, you don't actually have to use the names. And so it's 15, 30, 100. And then the tax,
00:45:41we're going to create some of those. And again, just add 10 to each of those values. So let's run
00:45:49that. Let's do a bare bones scatter plot. That's what it's all about. So we're going to go go dot
00:45:53scatter, scatter being this high level plot that we can use, what is created on top of our figure.
00:46:00And we're going to use x equals female age, y equals female salary. So when you see here with
00:46:05a scatter plot, it's numerical variable against numerical variable. And each one of them will be
00:46:10each dot that we create in the scatter plot will be part of a pair. And the mode that we're going to
00:46:15use is just markers and data equals trace. So just this trace part of a list. And we're going to use this
00:46:24key value pair of a Python dictionary to pass it to the I plot. And let's run that. And there we can see
00:46:30the way that we created it by adding those values that there's some sort of correlation between what
00:46:36we have at the bottom edge and the salary on the left hand side, the y axis that we can see here.
00:46:41So those are quite small dots. We can really do something about that. So let's change these markers.
00:46:46So I'm going to have mode still being markers, but then for marker, we're going to pass a dictionary of
00:46:51values. So the dict function here in Python. So the size being 12 and the marker being this orange
00:46:58color with a bit of opacity there, only 90% of the opacity. And let's change the layout. So the way
00:47:05that I'm going to use layout here is again as a Python dictionary. So we're going to have title being
00:47:11correlation between female age and salary. The x axis is the key. The value is another dictionary with a
00:47:18couple of key value pairs, title being age and zero line being false. And with the y axis title being salary
00:47:24and the zero line being false as well. And I plot the data is data and the layout is layout as per
00:47:30usual. There we go. So now we have an x axis title here, a y axis title. We have a title here at the top,
00:47:38correlation between female age and salary. And we see these much larger orange dots. And if I just hover
00:47:45on one, you can see that the value for them for that one was 1039 and the age was 34 that we can see
00:47:53at the bottom. We can change that. And now we can see them plotted the hover there being 31, the age
00:48:01and 1021 being the salary 0.161. So let's do more than one data set. So for that, I'm going to create
00:48:10two separate traces and one being the female, one being male. And again, it's age against salary.
00:48:16I think you know what's going on here now. Data will be the list of the two traces and the data.
00:48:22I'm going to just pass the data that I've created here, this list of the two traces
00:48:26to the data key value in my dictionary here. And we can, I plot that. So there we go. We've got female
00:48:32and orange and the male in this blue, and we can see all the values as we go up. We can see this
00:48:38beautiful correlation between age and salary there. So let's add a third variable in this 2D space.
00:48:44And that's what scatter plots are all about. And I can do that in a few ways. One is by marker size.
00:48:50And, and the other one is by marker color. So let's start with this, with this marker color.
00:48:55And that's to introduce a color scale. And you can see all the color scales that are available, grays,
00:49:00and this, and that, and that, and that, and that, et cetera. We can use Portland. There's Portland there.
00:49:05So we're going to create a trace and it's X equals female age, Y equals female salary,
00:49:10mode being markers, the marker being a dict of the size of 10. The color is, color is going to be the
00:49:17female debt and the color scale is Portland. So we have age, salary, and debt all in the same 2D plot.
00:49:26And that's going to create a scale on the right-hand side, a color scale. And we want that scale to be true.
00:49:31Look at the layout, what we've done there. Let's do the I plot. There we go.
00:49:35And now we've added this third variable because we've got age and salary, but this color is also
00:49:41going to be this color that we introduced here as the debt level. So down from 16 up here, 28.
00:49:48So these red ones have more debt, say, than these blue markers here. So that's one way to introduce
00:49:54a third variable. The other one is just by way of what we would know as a bubble chart,
00:49:59actually. But that's just the marker size. So what we're going to do here is just change from female
00:50:04to male. And the marker is going to be the size of the marker is passed as a part of a dictionary
00:50:11here. And we give it a color. Let's have a look at this. So now we see that this debt is now the size
00:50:21of these. So the larger the size, the larger the debt. So that's one extra way of of bringing in
00:50:28this third variable. So that means we can actually have four variables that we plot in 2D space because
00:50:33we can just combine, we can just combine the color scale and the size. So here I've made the size,
00:50:40the debt, and the color, the tax. So that I have four variables actually drawn right here in my
00:50:47two-dimensional scatter plot. And that's actually quite fantastic to do. So we've added this four
00:50:53variables just in this flat file by just looking at this bubble size, the marker size, and then this
00:51:00color scale. And you can see there the size was the debt and the scale here was the tax. So higher the
00:51:08tax value here, the more brown these values are. And we've again used earth. This time we've used earth
00:51:14just as the color scale that we have on the right hand side. So have a lot of fun with your scatter plots.
00:51:19You can really convey a lot of interesting information just using scatter plots. I'll see you in the next,
00:51:26in the next tutorial. Please remember to subscribe and hit the bell so that you can receive
00:51:33notification of all the tutorials that I do upload to YouTube.
00:51:49So welcome to this new tutorial. We're going to look at line charts, something that you might want
00:51:53to use quite often. So first cell, of course, we're going to just run our cascading style sheet.
00:52:00Then we have a header one and a header two setting up our plotting, plotly library. And as per usual,
00:52:06we're going to import iPlot and initialize the notebook mode. So let's just do that initialize
00:52:11the notebook mode. And then we're going to import plotly dot graph objects graph underscore objs as
00:52:18geo. And there we go. Let's create some data point values. I'm going to create two computer variables,
00:52:22they're going to be days and sales, and days going to go from Monday through Sunday. And sales,
00:52:27I'm just going to have these seven values, 11, 14, 11, 14, 10, 11, 10. And that's it. Let's run that
00:52:33to execute. And we now have our two computer variables. Let's just do a simple line chart.
00:52:39So to do a line chart, we're actually going to use the scatter. So scatter is in scatter plots.
00:52:45So what we're going to pass is just these three arguments, x being the days, y being the sales,
00:52:50but the mode just being lines. So our data is trace. And we're going to use this key value pair,
00:52:56Python dictionary here to do the iplot. And let's run that. And we see that we have the lines. And we
00:53:01go from 11 up to 14 down to 11, up to 14, down to 10, up to 11, and down to 10 for Sunday. So
00:53:10beautiful there. What we can also do is just to fill up the area under this curve, under the lines that
00:53:19we have here. So we're going to have fill, and we're going to fill to 0x. So whatever the lowest
00:53:24line here is, we're going to fill to there. The full color, I'm using a hex code in this instance,
00:53:28my mode is still lines. And let's plot that. And we see it's just going to use the default color,
00:53:33and it's going to, to 0, meaning this bottom line of the y value here is going to fill up everything
00:53:39below that. Now there are different line types that you can try. There's actually dash dot and dash dot.
00:53:45So let's do mode as lines again. And the line, we're going to pass to that a dictionary because
00:53:51it has these sub key words that it can use. So color, we're going to stick to this color.
00:53:57The width, we're going to do a width of 4. And the type, we're just going to do a dash.
00:54:02And what we're also going to do is just to lay out the x-axis, key value pair, key being x-axis,
00:54:08the value being another dictionary with a key value pair, zero line being false. We don't want to have
00:54:13the zero line at the bottom. And there we go. We see this orange color. We see it's quite thick
00:54:19with a 4. And everything is still there. We just change this line type. Now we can also add some
00:54:25markers. So instead of the mode just being lines, we can have lines plus markers. And we're going to give
00:54:31the size to these markers. So marker, marker, we're going to equal a dict with one of the arguments
00:54:38being size and the size being 16. And this time I'm going to add a layout. The layout I'm going to do
00:54:44as a, just as a Python dictionary here. So it's going to have a title and an x-axis. So the key
00:54:51value pair here is title and the title sales for last week. But what you can see here, I've got some
00:54:56HTML code in here. So I for italics and close to italics. So we can even do that as far as the title
00:55:02is concerned in the x-axis key. Its value is another dictionary with a key value paired, the zero
00:55:07line being false again. And then for IPlot, I'm just using this dictionary way to do it. And as I
00:55:15say, mentioned before, there's so many ways to do things in Plotly, which might make it confusing
00:55:20initially, but actually makes it much more powerful. And you can actually choose, you know, what works
00:55:24for you. So there we go. We've added the sales for last week. So we've got this title. We can see the
00:55:29last week is in italics. And now we've added these markers that might give it a bit more clarity as
00:55:36to what is going on here. Now we can also do some interpolation. These are just straight lines. So
00:55:42let's do a spline interpolation. So what we've got here is mode again being lines plus markers,
00:55:48the marker having a size, but the line, we're going to do one of its arguments there being shape.
00:55:55And we're going to make the shape a spline. So let's run that. And there we go.
00:56:03We can see instead of these straight lines, we have the spline curve in between these values.
00:56:08We can also do vertical and then horizontal. So the shape is VH first vertical and then horizontal.
00:56:16So let's run that so that you can see. So what it will do from this value, it will go vertical,
00:56:22vertically first until we get to the level of the second one and then go horizontal.
00:56:27So it's going to go vertical to the level of the next one and then go horizontal as opposed to HV,
00:56:32which is horizontal and then vertical. So now it's going to go vertical to this line of the second
00:56:39one and horizontal, I mean, and then vertical up. So horizontal first and then vertical. So you can play
00:56:45with those two. And there's actually a few more ways that you can go about this.
00:56:49Filling of the gaps. So let's do that. Let's take the third value there for sales. Remember,
00:56:54it's actually fourth because it's Python and it comes from zero. And we're just going to make that
00:56:58value in the list being none. And if we were to plot this, we see that we have this gap. So there was
00:57:05nothing for Thursday and this gap there exists, this gap now. And we can actually just fill in that gap
00:57:12by this connect gaps keyword in our scatter plot here and everything else being exactly the same.
00:57:21What it'll do now is it'll just fill that gap beyond this data point, which does not exist. It's none now.
00:57:28And it'll just fill that gap. So you can see line charts, quite a bit of fun and quite a useful thing,
00:57:33something that we use quite often. And there we go. Line charts or line plots. I'll see you in the next
00:57:39tutorial.
00:57:50So here we are in another tutorial. We're going to talk about the box and whisker plot. And it is a
00:57:54very common type of plot to use and quite informative.
00:57:57Now we're going to import the cascading style sheet, style.css, as we usually do. And let's set up our
00:58:04plotly library here. So from plotly.offline, we import iPlot and initialize notebook mode. And we
00:58:12actually call that function init notebook mode so that we can plot directly inside of the Jupyter notebook.
00:58:18And again, we're going to use high level charts. The box plot is a high level chart. So from plotly.graph
00:58:25objects or graph underscore objs, we're going to import that as go.
00:58:29So let's import numerical python because we're going to use that. And we're just going to see the
00:58:35pseudo random number generator so that we can get the same random values every time.
00:58:41So I'm going to create three computer variables here. Let's just increase the screen size here so
00:58:46you can see properly. We have group A, group B and the control group. And I'm going to draw
00:58:52500 values each time as you can see the size argument here from a normal distribution
00:58:58with a mean of 100 in the first instance and a standard deviation of 10. So the keyword arguments
00:59:03there are LOC and scale, a mean of 110 and the standard deviation of 15 and a mean of 105 and a
00:59:11standard deviation of 20. So we're just creating these three lists or arrays at least of 500 numbers each.
00:59:18So let's do a simple box plot. Again, we're going to have a trace and now high level chart is this box.
00:59:23So it's geo dot box. And on the y axis, we want the groups. The data says then a list of all the
00:59:33traces. We only have a single trace and we use this key data pair, key value pair here in the dictionary
00:59:40just to do the I plot. And there we go. And it's because we said y equals group A. So on the y
00:59:47axis here, we have all the levels and that gives us this vertical box plots.
00:59:55So if I hover over there, we can see a maximum. We can see a minimum. We can see the whiskers,
01:00:02the upper and lower fences there. We can see the medium and the first and third quartile values there.
01:00:08We can also see these outliers that are beyond the whiskers and I can actually hover over them and
01:00:14we can see those values as well. So let's just do more than one data set. And the way that we're
01:00:19going to do that is in a Pythonic way. So this is something new I haven't shown you before. Let's
01:00:25increase the size one more time so it's nice and clear. So I'm going to have this empty
01:00:31list here called trace and I'm going to have values inside of a list, a Python list group A,
01:00:36group B and control group. Those are the arrays that I created above. And then groups, I'm going
01:00:42to have this list of strings, group A, group B and group C. So I'm going to use a little for loop. So
01:00:49I'm going to say for I in range zero to the length of the groups. So the groups here is one, two,
01:00:55three. So it's going to go from zero to three, which in Python language means zero, one and two.
01:01:01So it's going to loop through a zero instance, a one instance and a two instance. So I'm going to
01:01:07append to this trace empty list a box and the Y is going to be VALS I. So the first one is VALS
01:01:17zero. VALS zero is group A. So I'm going to say Y equals group A and the name equals the first one
01:01:23or the zero with one here in this groups list. So there'll be group A and my data is going to be
01:01:30a trace. Now I'm going to run through this three times. So I'm actually going to have just have
01:01:34three traces. And this trace is a list. It's inside of square brackets. So I'm just going to have all
01:01:39of them there. So I hope you can see what's going on with this for loop. It's a Pythonic way of handling
01:01:44this. Instead of making three traces, I'm making one single, I'm doing it once in a for loop.
01:01:51So if I were to run that, well, let's just run our three values there, our three computer variables
01:01:59say, and then run our for loop. And now we can see we've got three traces named group A, group B,
01:02:03and group C, and we've plotted each of them. No problem. Now let's go through this again. And what
01:02:12we're going to say here, the only difference we're going to make is that we're going to do
01:02:17box plot equals outliers. So it's another argument that I'm adding to this box
01:02:24chart that I'm creating here. And although it's no different from what we've seen there,
01:02:31we've just explicitly said that we want these outliers now to be identified properly. I can also
01:02:37say, you know, omit the outliers and then these will disappear. They won't be shown here at all.
01:02:43Now there are more than one ways of doing horizontal box plots, but the easiest way is just to change from
01:02:49the y-axis to the x-axis. So that's the only thing that I'm changing here is to say that this
01:02:54must now be on the x-axis. And we see these values of my variables are now on the y-axis,
01:03:01making these box plots horizontal. No problem whatsoever. Now, instead of just these outliers,
01:03:09we can actually have all the box plots in there. And another argument I'm going to add here is box
01:03:14points. And I'm going to say all. I'm going to add a jitter of 0.2 and a point position of negative 1.5.
01:03:21Let me show you what that ends up being. There we go. It just shows all those 500
01:03:28points in here. The jitter means it's not down a straight line, which means you usually can't see
01:03:34them. I'm making them left, right, left, right, left, right. There's a bit of jitter
01:03:38on the axis here, just so that you can see them all. And the point position is negative 1.5. So
01:03:46that means on the left, just move them slightly away from this little box plot that we have here.
01:03:52And now we see all the box points, all the values plotted there. I can add a mean. So another argument
01:04:03that I'm adding here is box mean, because remember, what we see here in the middle is the median. We
01:04:08can also add the mean. Let's do that. And that'll draw this little horizontal line. I hope you can
01:04:13see it there, which because we've taken this from normal distributions, there's not going to be much
01:04:18difference between the mean and the median for all three of these. We can also do the mean and the
01:04:23standard deviation by setting box mean equal to this sd. If we run that, we can see that we have the mean
01:04:31and we have the standard deviation out here on this dotted line. Let's play with the line colors.
01:04:39So nothing really changed. I'm going to introduce line and that's a dict with a color. And we're going
01:04:44to make this black 0, 0, 0. There we go. That's this black with a width of one. We're not going to
01:04:52show the legends. So I can also take away the legend that we have here on the side. Let's run that.
01:04:59And there we see that everything is now in this gray scale, which perhaps is a better way to submit
01:05:06for publication. We can actually have a lot of control over what happens. And in this instance,
01:05:12I'm making my line black again. My full color is just I'm going to iterate over with this for loop.
01:05:20And every time I'm going to change that. So that's a dark gray, a middle gray and a lighter gray. So
01:05:25that is the full color for this line, which I didn't specify there. And I'm also going to have
01:05:32the marker, the outliers. I'm going to specifically change the color of the outliers. And this is a bit
01:05:39of an orange. And I'm going to use this open circle, one of the key value pairs inside of this
01:05:45dictionary, a symbol, and then also the size. So a lot of things that I can really play with.
01:05:50And if we look at this, this is actually quite beautiful. We have our dark gray, our middle
01:05:54gray, and our light gray, as I said there, and our outliers here are these 10 point sized
01:06:00open circles that are colored in orange. So really a lot that you can play with
01:06:06when dealing with, with these, with these box plots.
01:06:21Here we are in another tutorial. And this time around, we're going to look at subplots.
01:06:25How can we add a few more plots to the same figure? Let's run our first cell. They're just
01:06:33importing our cascading style sheet. There we go. We're going to import iplot and init notebook
01:06:39mode from plotly.offline. And we're going to initialize this notebook mode. I'm also going to
01:06:46import the high level chart objects as go go and a new one. We're also going to import tools just from
01:06:55the plotly library. Also importing numerical python. And we're going to see the pseudo number generator
01:07:02just with a value of one, two, three, four. First of all, two plots on the same row. So I'm going to
01:07:12create two traces. They're both going to be scatter plots. Now we've created this data, data that we've
01:07:18seen before, just a random integer chosen from a low of 20, a high of 65 and 100 of those. We're going
01:07:26to call that female age, then male age, female salary, male salary. And now we have debt. And we
01:07:32have female tax equals male tax. Just a bunch of random data that we've created. We've looked at it
01:07:37before. So trace zero is a scatter plot. On the y-axis, we're going to have the female salary,
01:07:44on the x-axis, I should say, and on the y-axis, female age. We have markers as the mode. So not
01:07:52lines or lines and markers, just markers. We're going to call this female and the marker size is
01:07:58going to have a certain color and a certain size. We're going to have trace one, also scatter exactly
01:08:04the same thing. Here comes the new part. I'm going to create a figure, a figure object here, just going
01:08:09to call it fig. And that's going to be tools dot make underscore subplots. And that's how you do
01:08:15those. And I want one row and two columns. If you think about it, if there's a single row and two
01:08:20columns, these plots are going to be next to each other. And I'm going to give each subplot a title.
01:08:27So I'm also going to use the argument subplot underscore titles. And I'm going to use this list
01:08:31of female and male. Now we need to append the traces to this figure. So we're going to say fig dot append
01:08:39underscore trace. Trace zero goes in position row one, column one. And trace one goes in row one,
01:08:47column two. So very easy to figure out how this is going to work. Fig dot layout dot update. You can
01:08:54also update a few more things in this instance, just a title. Let's run this and see what it looks like.
01:09:01And there we go. We see the two plots side by side, the subplot with the name drawn from the name
01:09:08and the trace male on the side, the colors we've drawn in, and we see the plots side by side.
01:09:16So our first subplot. Now, if we look at it, the y axis here are exactly the same 20 and 20, 30 and 30.
01:09:24We might as well just have one on the left hand side. So let's look at how to share that y axis.
01:09:30Again, the two traces, nothing has changed. The two, the subplots I've created, one row, two columns,
01:09:36female and male. I've now added a new argument called shared underscore y axis.
01:09:43Yes. I'm going to set that to true. Everything else is exactly the same. And if we run this,
01:09:49we see indeed that the axis on this right hand side, the male side has disappeared. We are now
01:09:55sharing the same y axis. Okay. What about two rows in one column? So all we're going to do is just to
01:10:04shift this around with make underscore subplots here. We're going to have two rows in one column
01:10:09and we're going to say a shared x axis. And you've got to think about this. These two traces that you
01:10:14are creating or the number of traces that you're creating, they've got to have
01:10:18the same domain and range here. Otherwise, this might not make a lot of sense. So you've got to
01:10:23think about that. And if we run that, we see we have two rows in one column. And also we we're
01:10:30sharing this single x axis. And no matter where we are, we can still get all the data because
01:10:37this is plotly, no matter where we hover, we're still going to get our data.
01:10:42So what about constraining the proportions? Now this one gets a bit more difficult
01:10:46and we need to pay a bit of attention here. We're still going to create our trace,
01:10:50that's a scatter. But our trace, the second one we're creating here, trace one,
01:10:55I'm going to put two new arguments. And that is x axis equals x2 as a string and y axis equals y2.
01:11:05Telling plotly that this goes on a different axis altogether. Now we don't need to do that
01:11:10if we're not doing anything fancy as we've seen before. But here we want to do something. So I'm going
01:11:15to use a different way of creating the data. That's a python list trace zero and trace one.
01:11:22And I'm going to have a separate layout as we've had before and all the other tutorials that we've
01:11:28had. So that's go.layout. I'm still going to have a title. I'm going to have an x axis. And I'm going
01:11:35to pass a dictionary to that. And that dictionary is a domain. And think about this x axis going from
01:11:400 to 1. So 0 to 100% of the width of what you have available. And here in the domain,
01:11:48we're saying use the leftmost 70% of the area of our figure for our first trace.
01:11:56So the x axis goes from 0 to 0.7. X axis 2, remember, which comes in this one here,
01:12:02we said x axis 2 equals x2. So x axis 2 goes from 80% to 100%. So the 20% on the right hand
01:12:10side. And we just have to say that y axis 2, so the second plot is anchored to x2. So it will know
01:12:19that it has to anchor the second y axis to this part that we've specified up here. So go and mess
01:12:24it about and put it in the first one. And then fig equals go.figure. So we've done it this way around.
01:12:31Remember, you could just say iplot and use the standard dictionary format that we've seen before.
01:12:37So let's run that. And now we can see we're constraining this right hand side to the last 80%
01:12:42and 70% on this side. And this y axis was told to fit with this little one on the right hand side.
01:12:51Okay, let's customize the axis a little bit more. Here's a long one for you. We're just creating these
01:12:58four traces. They're all histograms. We're going to make a subplot and we're going to say we want two
01:13:02rows and two columns. We give a name to each of those. You don't have to, but we're going to call
01:13:08it figure A, B, C, and D. And I append each of those to the position that I want. Row one, column one,
01:13:13row one, column two, row two, column one, row two, column two. So it's always row first and then column.
01:13:19And it's got to line up with what you specified up here. So fig dot layout, x axis one, update its
01:13:27title, x axis two, update its title, x axis three and four. And then y axis one, two, three, four.
01:13:34These have all got to line up with a number of plots you've created there. And I can give each of the
01:13:38axis its own separate title. We're just going to update to the main title. And one new thing here,
01:13:46I'm just going to say show legend equals false because I'm building in all the detail into each
01:13:50of these little subplots. Let's run that. And there we go. We get our four histograms that we created
01:13:57here. Figure A, B, C, D. They've all got the axis. They're all nicely labeled. No problem whatsoever.
01:14:04So you can see female age and frequency, male age and frequency, female salary frequency,
01:14:08male salary frequency. So I can individually name each of these axis so we know what it's all about.
01:14:15Let's just look lastly at an odd pairing. So I'm going to introduce a new argument here to
01:14:22the tools.make subplots. I want two rows and two columns. But you can see I've only listed three
01:14:28things here. That's because I want the bottom two to be combined and I wanted to make up one single
01:14:36long plot. So two at the top. The first row will have two columns. The second row, just a single
01:14:42column. So it's going to go all the way across. So I've got to introduce the specs. Now look at it.
01:14:47We see it's a list and it's a list in two parts. The first part is going to be for this first row,
01:14:53just showing the two there. And the second one, we're going to say the column span spans both of
01:15:00the columns and then none because it is just the full thing that we want because we only want these
01:15:06three figures. Otherwise, everything is exactly the same as we've seen before. And there we go.
01:15:12See what we did? It spans the whole, this third one here spans this whole row on both sides. So look
01:15:18and play around with this spec, specs argument, but you can clearly see how that was built up for the
01:15:25two rows. And we want those two in the first row, but we want a single one in the second row there.
01:15:34And that is subplots. Have some fun enjoying your subplots. It really helps to be able to do subplots
01:15:40and especially if you're creating reports or manuscripts for publication to have more than one
01:15:46element in the same plot. Remember to subscribe and hit that like button if you want to hit the
01:15:54notification button though. That's important so that you can get notified if new material is released.
01:15:59Remember that you can also find all of these files on GitHub. A link is in the description below.
01:16:06Speak to you next time.
01:16:16In this tutorial, I want to show you how easy it is to import images into a co-laboratory notebook.
01:16:23So I'm on my Google Drive here. I've navigated to I keep my YouTube tutorials and I'm going to simply say
01:16:29new, scroll down to more and then co-laboratory.
01:16:35After a few seconds, it opens a Jupyter notebook for me. Let's change the name of this notebook.
01:16:40I'm just going to highlight untitled 0 and we're going to say display images as the title for our
01:16:49Jupyter notebook. So we've got our first cell here and I'm going to import two things. I'm going to say
01:16:56from google.colab we're going to say import files. Very important because we're going to use that files
01:17:06to actually import our image. And then to display the image in a notebook, I'm going to say from
01:17:13ipython.display import image. So those two lines of code and we're going to use them as I say to
01:17:24import our image and then to display our image. We'll add a new code cell and then here we're going to
01:17:32create a computer variable called uploaded and that is going to be files.upload files.upload. So we're
01:17:41going to call this upload and we can now choose a file. It's going to open the browser that is
01:17:53put into our operating system and I'm going to choose from one of those files. It's going to upload now as
01:18:01you can see 0%, 18%, 35% and there you go 100%. Now you'll notice it says saving krglogo.lightpng to
01:18:17krglogo.lightspace2. So I've got two other notebooks open and I've already loaded these logos there
01:18:25and an instance of that will exist. And it is this name that I now have to use to display
01:18:31whatever it was called during the upload is what I'm now going to use. So let's use image and inside
01:18:38of image we're now going to use krglogo.lightpng. And I just want to add one more argument because it's a
01:18:51quite a large image that I'm importing here. So it's width. Let's set the width here in my case to 1200.
01:19:00Let's run that. And there we go we can see the logo being displayed there beautifully.
01:19:07So that is how you import an image and how you display an image in a Colaboratory Jupyter Notebook.
01:19:13In this tutorial I want to talk to you about using Plotly inside of Google's Colaboratory.
01:19:21So I went to my Google Drive and I've opened a new Jupyter Notebook as a Colaboratory file here.
01:19:30I've given it a name and imported my research group's image. So let's talk about importing or using Plotly
01:19:39inside of Colaboratory. Now it is one of the libraries that has already been loaded. So it is
01:19:46no problem just to use Plotly. You can just import it as you would usually do. So we're going to import
01:19:53NumPy as np. From SciPy I'm going to import stats just so that we can simulate some data. And then from
01:20:00Plotly.offline I'm going to import iPlot and init notebook mode. And I'm also going to import Plotly.graph
01:20:07objects as Geo. Now notice carefully I'm not initializing the notebook mode right now.
01:20:15So to run this we click on the little arrow to the left hand side and that cell executes.
01:20:22You'll notice on the top right hand corner it has connected and it is connected to the Google
01:20:29engine as far as Python is concerned. Now you can pause the video here and copy this function. You've
01:20:36got to create a function that you then call in every cell that you want to use Plotly in. So I've
01:20:43called it configure underscore Plotly underscore browser underscore state and we're going to import
01:20:48ipython and we're going to write this display script. As I said you can just copy and paste it. Let's run.
01:20:56And now let's get to simulating some data. So I'm just going to seed the pseudorandom number generator
01:21:05there with the number 1 and I'm going to create a computer variable called WCC for instance white cell
01:21:11count and I'm going to take that from a normal distribution with a mean of 15 standard deviation
01:21:16of 3 and I want 1000 of these data point values. And now we can plot. Now as I mentioned here it's
01:21:24required in every cell that you want to run a Plotly graph in. So you've got to use these two lines of
01:21:31code. So we're going to call this function and then we're going to initialize the notebook mode with
01:21:35connected equal false. So you can copy and paste that as well. That has got to go in every cell.
01:21:43And now it's just normal Plotly. I'm going to use trace 0 as is the norm. Go dot histogram. My x-axis
01:21:49is going to be the white cell count. My data object is going to hold the trace as its single
01:21:54list element. And I'm just going to make the plot look a bit better by introducing some layout. And
01:22:00then I can call IPlot and I use as always the dictionary format. So data being data and layout
01:22:06being layout. Let's run that. And there you go. We have a Plotly graph right inside of a
01:22:13co-laboratory Python notebook. Beautiful. So copy and paste the definition and remember to
01:22:19call that the function and to call that function in every cell that you are going to use to create a
01:22:26Plotly graph.
Recommended
12:00
|
Up next
14:59
1:21
12:50
1:23:56
1:03:10
1:08:17
1:06:27
48:44
56:23
1:22:34
31:07
59:43
1:59:39
1:19:21
1:28:40
1:36:31