Time Series Analysis in Python | Time Series Forecasting | Data Science with Python | Edureka

Time Series Analysis in Python | Time Series Forecasting | Data Science with Python | Edureka


hey guys this is Aayushi from in Edureka and today’s session we would be discussing algorithm that helps us analyze the past
trends and lets us focus what is to unfold next so this will go rhythm is
time series analysis now let’s quickly jump onto our agenda and let’s see what
all we are going to cover in today’s training so we’ll start off this session
by understanding why do we need time series analysis and then we’ll
understand what exactly it is now once we clear with time series will then see
the different components that we need to take care while we apply time series
then we’ll also discuss when should you not use time series analysis or what are
the cases when you should not apply time series analysis moving ahead in the
session we also discuss what is stationarity or what are the tests that
are used to perform to check the stationerity of the data next we’ll be
discussing the ARIMA models now ARIMA model is one of the best model that has
been used in time series so we’ll have a discussion on that and we’ll finally go
ahead with the demo part we’re in and implement all these things and help you
guys to forecast the future as well so I hope you guys are clear with the agenda
so kindly drop me a quick confirmation or you can just write it down in your
chat box so that I can proceed all right Monica says I yes okay so what they gave
me a thumbs up naman Shivani all right since you guys are clear so
let’s begin with the very first topic that is why should you use time series
analysis so first of all in time series analysis you just have one variable that
is time now you must have seen there is a lot of algorithms present then why do
we need one more algorithm that is time series so let me explain you this with
an example now let’s take an example of a supervised learning so under
supervised learning we have linear regression or logistic so there we have
an independent variable and we have a dependent variable so there what we do
we deduce a function or you can say a mapping function of how one variable is
related to another and then we can go ahead with analysis part but in time
series analysis you just have one variable that is time so for example you
own a coffee shop it’s quite a successful coffee shop in the town so
what do you do you try to see how many number of cups of coffee you sell every
month for that what you will do you add up all the sales of your coffee now
let’s say you started this coffee shop in the first month that is the January
so what you’ll do you record the data month wise and then you’ll sum it up so
you will have all the data till the present month but what if you want to
know the sales the next month or the next year
now imagine guys you just have one variable that is sales and you need to
predict that variable in accordance with time so in such cases we’re just
halftime and you need to predict the other variable you need time series
analysis now we know why do we need time series analysis let’s move ahead and
understand what exactly time series is so time series is a set of observations
or you can say data points which are taken at a specified time now over here
at your x-axis you have the time and on the y-axis you have the magnitude of the
data so if you try to plot time series plot on the x axis you will always get
the time which is divided in two equal intervals so cannot create a time series
in one data point is at week level and other are different this should be equal
interval let’s say a day a week a month a year a decade and a century so that is
the constant thing that a time series require now let us see the importance of
time series analysis now first and foremost is business forecasting because
your pass defines what is going to happen in future so let’s say you’ll be
seeing a lot of traders in the same six who are trying to predict what will be
the price of the stock market tomorrow so that is nothing but a business
forecasting you also see a lot of retailers who tries to know how many
number of goods they are going to sell the next day so all of this can be
achieved with time series analysis now this is not just limited to one domain
like retail or finance but it is applicable almost everywhere now it is
also help us to analyze a past behavior so here you can analyze in which man did
the sales went up or when was the dip so here you can easily understand your past
data so with every dip and a peak there is a business reason attached to it so
you can understand this with respect to time for example some festival is there
and you’re selling chocolates so your sales will increase during a festival so
you need to think about the seasonality part also now don’t worry guys we’ll be
having a complete discussion on seasonality as well so now coming bad it
also helps you to plan the future operations
so you can analyze the past and then you can forecast your future using this
algorithm that is time series analysis now apart from all these we can also
evaluate current accomplishment so this means you can deter my
which goals you have met in the current scenario let’s say you have predicted
okay I am going to sell around hundred chocolates in a day but didn’t you
actually do that so all of this can be analyzed using time series analysis
moving ahead let us see the different components of time series now most of
time series have trained seasonality and irregularity associated with them and
some of them do have cyclic patterns also but it is not compulsory that there
has to be pattern present so let us discuss each one of them in detail now
the first is train the trend is nothing but a movement to relatively higher or
lower values over a long period of time so when the time series analysis shows a
general pattern that is up firt we call it an uptrend also if the trend exhibits
a lower pattern that is down we’d call it as a downtrend and if there was no
Train we call it as a horizontal train or you can say a stay steady train so
now let me explain you better with an example so there is a new Township that
has been constructed okay and people are going to come and live over there so
what happens a hardware guy comes up and opens up a shop there so people will be
coming up will definitely buy a stuff from there now once all these houses are
settled up or it’s been occupied the mean of hardware reduces so the train
may go down so let’s say the sales were up in the first year and buy another one
here or maybe in six months it has gone down so that is the trend guys so for
some amount of time selling was high and then it got down but this is not a
pattern this is something that is happening here on here but trend is
something that happens for some time and then it disappears then we have
seasonality so Hill season and it is basically upward or downward swings but
this is quite different it’s a repeating pattern within a fixed time period so
for example Christmas happens every year 2050 simple let’s say you’re on the
business of chocolates so every year on year chocolates are served more and more
in the last week of December now this is because Christmas is there and you’ve
been able to sing this across to you that is from past two years four years
six years ten years and so on so it’s a repeating pattern within a fixed time
period while in trend that is not the case now let me take another example
let’s say ice cream this time so ice cream sales will go comparatively higher
in summers rather than in winter so that is again a seasonality
then we have irregularity or it is also called as noise so these are erratic in
nature or you can say unsystematic it is also called as residual so this happens
basically for short duration and is non repeating so here let me give you an
example so let’s say there is a natural disaster let’s say there is a flood in
your town out of nowhere in one year now a lot of people are buying medicines and
oil meant for relief but after some time when everything is settled up the sales
of those oilman’s have gone down so this is something that no one could have
predicted it’s going to happen erratically you don’t know how much
number of sales are going to happen so you cannot force you about the event
that the flood is happening okay so this is some random variation so this is what
a regularity is now moving ahead we have cyclic so cyclic is basically repeating
up and down movements so this means you can go over more than a year so they
don’t have a fixed pattern so they can happen anytime let’s say in two years
then fourth year then maybe in six months so they keep on repeating and
they are much harder to predict now moving ahead let’s discuss when not to
apply time series analysis so first of all you cannot see time series analysis
when the values are constant so let me take the same coffee example over here
so let’s say the sales of number of coffee in the previous month were 500
then this month also the sales number is almost the same that is 500 I wanted to
predict the number of sales in the next month now in such cases where the values
are constant as in our case the number of sales so 500 in the previous month
and then in this month also we have the same number and now we want to predict
it for the next month so in such cases where the values are constant time
series cannot be applied similarly if you have values in the form of functions
let’s say you have sine of X or cos of X so for example in this case you have X
value and you can get the value by just putting it in the function so there is
no point of applying time series analysis where you can calculate the
values by just using a function now you can apply time series to these as well
but again there is no point of applying it if you have a formula before that or
the values are just constant so these are the cases when you should not apply
it I am series moving ahead let us see what is stationarity so no matter what
guys how much you try to avoid the stationarity part it will
always be there in dying cities so here time-series requires the reader to be
stationary so any kind of statistical model that will apply on time series the
data should be stationary so let’s understand what exactly it is now most
of the models work on the assumption that time series is stationary
now if the time series has a particular behavior over time there is a very high
probability that it will have then it will follow the same in the future also
the theories and formulas that are related to stationary series are more
mature and easier to implement as compared to non stationary series now
there are two major reasons behind the non stationary of a time series so first
is train which is basically the wearing mean over time secondly we have
seasonality so this is the variation of a specific time frame but did you guys
get the answer to this question what exactly is stationary or how exactly
Society is defined so stationarity basically has a very strict criteria the
first one is it should have a constant mean now here the mean should be
constant according to the time secondly we have constant variance so again
beading should be equal at different time intervals and thirdly we have auto
covariance that does not depend on time so for those of you who don’t know what
mean is I not go into the details but I’ll just explain you in a nutshell so
mean is basically the average then variance is just the distance from the
mean so each points distance from the mean should be equal and then we have
Auto covariance that should not depend on time or it should be equal as well so
for example let’s say you’re standing at time T okay and your previous time
period was P minus 1 or P minus 2 let’s say there are previous two time periods
so the values at P minus 2 or P minus 1 P they should not have any kind of
correlation between them which is basically dependent on your time period
so that is nothing but auto covariance so when these three conditions are met
then we can say at series is stationary and then we can apply time series
analysis over it now to check the stationerity in python we have two
popular tests now first is rolling statistics and second is a DCF Oregons
are augmented by key for your test now in rolling statistics we can plot the
moving average or you can say moving variance and see if it varies with time
now by moving average or variance I mean that any instance T
take the average or variance of a time window let’s say if you want to know for
the last year that is for last 12 months or anything and also guys this is more
of a visual technique so you cannot deploy this kind of stuff on production
but it is quite useful for the POC purpose then we have a DCF or you can
say augmented dickey-fuller test in the world of data science so Dickey for your
days which is another statistical test for checking stationarity now here you
have the null hypothesis which is time series is non stationary and once you
perform this test you will get a result which comprises of a test statistic and
some critical values for different confidence level now here it is said
that if the test statistic is less than the critical value we can reject the
null hypothesis and say that the series is stationary so don’t worry guys I will
be explaining this again when we go to a demo part but I hope you guys are clear
with what exactly stay steady and how we can check the stationerity all right so
now let me just move on to my next topic so now I will discuss what exactly is
ARIMA model now ARIMA is one of the best model to work with time series data so
this is basically the combination of two models that is AR plus MA and it’s quite
powerful guys so once you combine both of these model you get the ARIMA model
now your AR model stands for auto regressive part an MA model stands for
moving average so AR is a separate model MA is a separate model and what binds it
together is the integration part that is indicated by I so air is nothing but the
correlation between the previous time period to the current so what does this
mean now let’s take this into consideration that you are standing at a
time period t and there are previous time periods like t minus 1 t minus 2 t
minus 3 now if you find any correlation between p minus 3 and t that is nothing
but the auto regressive part so as i told you earlier that there is always
some kind of noise or irregularity attached in a time series so need to
figure out that noise in fact we need to average that out now whenever we try to
average it out the cross and drop set of prison in that noise smoothen out and we
can have average focused of that noise you can actually never predict when a
next customer is going to come in and buy hundred items at once so try to
soothin it up by taking its average now ARIMA model has three parameters it has
p it has Q and has D so P basically refers
to your auto regressive lags then Q stands for moving average and D is the
order of differentiation so we have each parameter for each of the models so if
we take the integration by just one order then the value of D would be one
if we differentiate it in the order of two then we have the value D equals to
two so that is how we can predict these values PQ and D and each of them has a
different method to it so if you want to predict the value of P you will be using
and PS EF graph that is nothing but a partial autocorrelation graph then to
predict Q value we need to plot a CF lot that is autocorrelation plot and D I
have already told you to make data stationary we use some kind of
differentiation so the order differentiation defines the value D so I
guess enough of theory part so now let’s quickly jump onto the demo and let’s see
how you can implement all of these things so now we’ll have a look to a
demo and we’ll focus the future so here we have a problem statement with is a
line which has the data of passengers across months so here what you need to
do you need to build a forecast to determine how many number of passengers
are going to abort these Airlines at the month level in the future so here we
have month or you can say dates so here we have dates from 1949 till 1960 and we
have the number of passengers traveling per month so now we have this kind of
data and we need to analyze what will be the number of passengers if you have to
predict it for next ten years so now let me just go to my jupiter notebook and it
is how my predictions look like so guys this is my jupiter notebook pen i have
the code and we’ll be implementing all the things that we have discussed till
now so first of all we’ll be inputting all the necessary libraries so here we
have imported numpy then we have imported pandas for data analysis part
and you can say data processing then we have imported Madrid live for data
visualization creating plots and all those things then in order to implement
matplotlib we have also written percentage matplotlib in line for
jupiter notebook so not get a particular plot open in a new window everything
will be there in your jupiter notebook itself and then i have just defined the
size so now let me just run this next what I’ve done I have imported my
air passengers data using pandas so we have a function of read CSV in bundle
that is represented with PD so we have substituted this in a variable data set
and then what we have done we have just passed those strings in a date-time
format so here we have set our data month wise so using pandas we have a
function to date/time so over here you can specify a month and then you can
just set this as your index so here you have index variable as month next what I
have done I have imported date/time and then I have just printed the top five
values so now let me just run this this is how my data looks like I have
month asthma index and then I have number of passengers asthma second
column so this data have already showed you in the presentation where I have the
data from 1949 until 1960 so I have just printed the head of it so now let me
explain the pain so let’s say I want to know the last five data entries so here
we have data till 1960 and we have the number of passengers next what we have
done we have simply plotted a graph between them so guys in time series we
have date and we have another variable so here my other variable is number of
air passengers so here we have date on my x-axis and number of passengers on my
y-axis and then we have simply plot that graph so now let me just run this so
this is how your data look like so here if you notice you have a trend so our
next step is to check the stationerity so I’ll give you 10 second guys and
think whether this data is stationary or not
so just think and give me a reply whether this data is stationary or not
right Shivani so this data is non-stationary so here you can see the
trend is going up so let’s say if you want to calculate the mean at 1951 so
here your mean will lie somewhat over here and let’s say we want to calculate
the mean of this year that is 1960 so here your mean will be somewhere here so
here you can see that you have up for train and the mean is not constant so
this tails mean your data is not stationary so now I have told you guys
that there are two tests which basically helps you in checking the Society of the
data so here we have rolling statistics as well as we have a DCF let us go
through each one of them so I will be first going to the rolling statistics so
here we have rolling mean and we are rolling standard deviation so here as
you can see we have a window of 12 that is nothing but the window of 12 months
so let’s say we have Jan of 1949 and you place the value of Jan 1950 with the
value of 1949 so this gives you the rolling mean at a yearly level and you
have to do the same with the standard deviation as well so in Python to
calculate mean and standard deviation you have a function dot mean and you
have got STD so this will automatically calculate mean and standard deviation so
now let me just run this so here if you notice your first 11
roses na n that is not a number now this is because we have guys created all the
averages of these 11 and given over here and similarly you can do the same for
the next ones next if you just scroll a little bit you see it’s a long data set
and you have the same result for standard deviation as well so it’s the
same procedure guys average has been calculated and then just give an hour so
here must be having a question by only 11 values are in here so over here we
have just given a window of 12 lets have given a window at daily basis or you
have data at a day level then your window size would be 365 so here my data
is at monthly levels so the focus will be on monthly only now similarly if you
have data at day level then probably your window can be 365 so I hope you get
the reason why I am giving the wind as 12 and by via calculating the mean and
standard deviation then what we have done we have simply plotted this rolling
statistics bar so here we have the original data which is just plotted by
the color view then we have the mean data so here we have just plotted the
mean for what we have just calculated above and then we have given the color
red to it similarly we have plotted the same for standard deviation and we have
given a color black to it after that we have just given a legend we have given a
title to it and now let me just run this code for it so over here you can see we
have a plot somewhat like this so nice blue line is my original data and as you
can see I have my mean in red and I have a strolling standard deviation in black
color so over here you can conclude that your mean and even your standard
deviation is not constant so our data is not stationary so guys this is my
rolling statistics method is again a visual technique so here we have already
concluded that this is not a stationary data set now let me perform
dickey-fuller test as well so to perform dickey-fuller test in python you have to
import from stats modeled or TSH scat tools input a be fuller now this is the
function which has been provided in Dickey fuller test so here I have a
function that is ad filler I have passed the data set into it which is the number
of passengers and then I have just given a lag which is equals to a I see now
AIC is basically a chi k information criterion now what does this AI c mean
so a IC gives you the information about what you wanted and I
Cirie’s the exact values the actual value and analyzes from the difference
between them so don’t just worry about these guys for now just think about this
as a metric and see what happens when we just run this particular test so when we
run this we’ll have values to test statistics we have key value number of
lags that has been used and number of observations used and then we have
printed the values in a loop so now let me just run the cell as well
so this false statement will basically pin all the values now I have a state
statistic value a p-value number of lags use number of observation and we have
critical value at different percentages so here your null hypothesis says that
your p-value should be always less so here we have a very large value that is
0.9 so this should be somewhat around 0.5 so that would be a great thing also
a critical value should also be more than the test statistic so here we
cannot reject the null hypothesis and we can say that data is not stationary then
what we’ll do we estimate the trend so here also with the results of Dickey
fuller we got to note that the data is not stationary then what we’ll do we’ll
estimate the Train so here what we have done we have taken a log of the index
data set so index data set is nothing but the data set which has index has
time or the data which has been set monthly wised so here we have just taken
a log and let me just run this for you now if you see here numbers on your
y-axis half gene because the scale itself has change here we have taken the
log but here your trends remains the same whereas the value of y has been
changed next let us calculate the moving average with the same window but keep in
mind guys at this time we’ll be taking up with the log time series so again
we’ll be having windows will show 12 that is nothing but the twelve months
and then we’ll be just plotting the graph with a long time series so here
data is already in the Log form so now let me just print it
so here you can conclude that mean is not stationary but it is quite better
than the previous one but again it is not stationary because it’s moving with
the time and this train is again an upward train so we can say that the data
is not stationary again next what we’ll do we’ll get the difference between the
moving average and the actual number of passengers so we have mean and the
actual time series that we have now why are we doing this now the reason is that
unless we perform all this transformation will not get the time
series are stationary so now you must be having a question as to whether it’s the
standard way to make a time sea stationary no it’s not guys because it
depends on your time series as in how you can make it stationary like
sometimes you have to take log sometimes you might want to take a square of it
some time cube roots so it all depends on data what it holds so here we’re
going to log scale so we are going to take MA and then subtract both of them
so here we have the log scale and we have the moving average and then we have
just painted the head of it that is the top 12 values then what we have done we
have just removed them na n values so that is done by just typing drop na and
the brace you can write in place not true and then just print the head of it
so now let me just run this so here we have the month and we have the number of
passenger so here we have the numbers which is basically the difference then
moving ahead I have purposely put an actual code of this a DCF test so a DCF
is augmented dickey-fuller test so above I have just applied a simple a DCF
function but this is the whole code guys so you have to perform this whenever you
have to determine whether time series is stationary or not so here I have defined
a function which is pair stationary and I have performed both the tests I have
determined rolling statistics as well as performed dickey-fuller test so over
here I have used the windows 12 and then I have plot rolling statistics as well I
have performed the dickey-fuller test ezreal so let me just run this and I’ll
just land of action as well so now if you see you have the original data as
blue lines then you have standard deviation in black line and you have
rolling mean in red line so here you can visually notice that there is no such
trend or you can say it is much better than what we use to see earlier so here
we have rolling standard deviation and we have rolling me
now let me see that a DCF results as well so here if you notice your p-value
is relatively less in only cases we used to have 0.9 something and where you have
P value at 0.02 now if you notice your critical value and your test statistics
values are almost equal which basically helps you to determine whether your data
is stationary or not so I hope by now you got the idea between the
dickey-fuller test and the rolling statistics text as to how you can
determine whether the data is stationary or not next what I have done I’ve
calculated the weighted average of time series now why I have done this because
we need to see the trend that is present inside a time series so that is why if
you have calculated the weighted average of time series so now let me just run
this I didn’t get to know why I’m talking about this so as you can see
here as the time series is progressive the average is also progressing towards
the higher side so here your trend is upward ants and keeps on increasing with
respect to time moving ahead let’s see another transformation where we have a
log scale and then we’ll subtract the weighted average from it so in a
previous scenario we have subtracted simple mean but in this will be using
weighted mean and then we’ll check for stationarity so here we have just
subtracted them and then pass the variable in the test stationarity
function that we’ll just define it over here so over here it will go through
both of the tests and then it will display the results so over here I’ll
just run the cell so over here you can notice that your standard deviation is
quite flat it is not moving here and there and in fact you can also say that
this doesn’t have any trend also if you notice the rolling mean it is quite
better than the previous one now let me go see the results of a VCF test as well
so over here you have a very list value of P that is P is equal to 0.005 so your
TS is again stationary which means that your time series is again stationary so
here you can use both this transformation to check whether your
data is stationary or not so now we know that a data is stationary now what we’ll
do we’ll shift the values into time series so that we can use it in the
forecasting so what we have done earlier we have subtracted the value of mean
from the actual value now what we’ll do we’ll use the function called a shift to
shift all of those values so here let me just run this plot so this is how the
plot looks like now here we have taken a lag of bun so here we have just shift
the values by 1 or you can say difference your time series ones so why
is if you remember I talked about the ARIMA model so a Rhema model has three
models in it that is the AR model which stands for auto regressive then we have
ma model that is for moving average and is for the integration so re model
basically takes three parameters and B there stands for the integration part or
you can say how many times you have differentiated a time series so here
your value becomes one now next what I have done I have simply dropped the NA n
values so here if you just run this code you will see that output is quite flat
so here your null hypothesis or the augmented dickey-fuller test whaling
will take the null hypothesis is rejected and hence we can say that your
time series is stationary now so here you can say that you again have blue as
the original data you have red as you’re rolling mean and you have black as your
standard deviation so visually also we see that there is no train presence
quite flat so here we can say that your time series is stationary now let us see
the components of time series so here you first need to import from stats
model to TSA or seasonal input seasonal decompose so here your seasonal
decomposed segregates three components that is train seasonal and residual so
here what we have done we have simply plotted these graphs and let us see how
all these graphs looks like let me just run this
so this is how your output look like this is my original data which we saw
that there was a trend so this is my trend line so this is going upward in
which you can say it’s quite linear in nature along with that we have
seasonality also present in high scale so we have a seasonality graph over here
and then we have the residuals as well so residuals are nothing guys the
irregularities that is present in your data so they do not have any shape any
size and it cannot find out what is going to happen next so it’s quite
regular in nature now what we are going to do we’ll check the noise if it’s
stationary or not so overlay we take the residual and we’ll save it in a variable
that is decomposed log data and again I just pass it to the same function that
we have just created above which is test stage theory and inside the stay
stationary function we have to test that is rolling statistics and a DCF test so
now let me just run this cell and this is how your graph looks like so looking
at the output visually you can say that this is not stationary that is why we
have to have your moving average parameter in place so that it’s smooth
and set out to predict what will happen next
now we know the value of D but how can you know the value of P and Q that is
the value of autoregressive lags and the value of moving average so here as I
told you guys we need to plot a CF graph and P ACF graph so in order to calculate
the values of p we need to plot PS EF graph and in order to calculate the
value of Q we need to calculate a CF graph so is here basically refers to a
autocorrelation graph and a PS you have stands for partial autocorrelation graph
so in Python we first need to import these two graphs that is from stats
model tortillas a dot stat tools input ACF and P ACF then using this function
ACF and PS EF Piazza’s pass in a data set and we have preferred a method that
is OLS so there are various methods but we usually prefer OLS so where is his
ordinary least square method then what we have done we have simply plot a CF
graph and we have plotted the PS EF Roth so now let me just run this and let’s
determine how you can calculate p-value and Q value so guys this is my
autocorrelation graph and this is my partial autocorrelation graph now in
order to calculate the P and Q values you need to check that what is the value
where the graph cuts off you or you can set drops to zero for the
first time so if you look closely you have it touches the confidence level
over here so here if you see your p-values almost around two and similarly
if you look at this graph you see that it cuts it over here or drops to zero
over here and then the value of Q also becomes two so this is how you can
calculate the value of P and Q using PS here graph and a CF graph next we have
the value of P if you have the value of Q and we have the value of D so what we
can do we can simply substitute these values in the Rhema model so here what I
have done I first imported the model ARIMA and then using the function edema
I have the order listed over here so I have P value as – I have differenced it
1 so my D value becomes 1 and my Q value is again 2 so here I have just plotted
the graph and then calculate the RSS which is the residual sum of squares so
here let me just run this graph so here you can see the residual sum of
square is quite good that is one point zero two so here you have plotted the
values of P Q and D as two two and one now you can also play around with these
P and Q parameters now let’s say I want to change the parameters to two one zero
so if I do that let me just run this again so here if you see once I have
just changed the value to two one zero my RSS score has been increased so
greater than our essence the bad it is for you now let me again change it to
zero one two now in that case also my RSS has been increased to one point four
so here you need to take care of the RSS part so the greater the RSS the bad it
is for you so here we’ll just revert back to 2 1 2 wherein we have the value
of P as 2 Q as 2 and we have taken only one difference so the value would be 1
now let’s take the moving average model in consideration so here a p value is 0
now for our model you have to do 2 1 0 next for a our model what you can do you
have to do 2 1 0 wherein you have the value of Q as 0 so here I have 2 1 0 and
let me just run it for you so here you can see that if RSS has
again reached 1.5 now we have seen that with respect to a R that is your auto
regressive part your RSS is 1.5 now affair again go to ma wherein I have the
values as 0 1 2 the RSS score is 1.4 so here we conclude that with respect to
auto regressive part we have the RSS as 1.5 with respect to moving average we
have the RSS is 1.4 and if we combine both of them and make a rim out of it
that is this part that is 2 1 2 we have very less RSS so let me just run this as
well so here when I substitute the values as 2 1 2 that is P and Q value is
equal to 2 and D we have taken as 1 so here your Rima model gives you RSS of 1
point 0 2 which is quite good next what we’ll do let’s fit them in a combined
model that is ARIMA so here we have seen that with respect to a R we have RSS is
1.5 with respect to ma that is moving average we have RSS as 1.4 and when we
apply the combined model that is ARIMA the RSS or you can say the residual sum
of square is dropped to 1 point so here let’s do some fitting on the
time series on what data we have so here we have just converted the fitted values
into a series format and then we have just printed the head of it so now let
me just run this so over here we have the month as well as the predictions
over here next what we’ll do we’ll find it the cumulative sum and then we’ll
find them and then we’re going to have the predictions done for the fitted
values so now – Cal – the cumulative sum we have the function called has come sum
and then again we have just printed the head so this is my result and finally
we’re going to have the predictions done for the fitted values and then we have
just printed the head of it so now let me just run this next few also keep in
mind that after performing these transformations we also need the
exponential of the whole data so that it comes back to the original form from
where we have just started using it so in order to know the values in that form
you need to take the exponent of it so these are the three steps which are very
important for data transformation so you’ll be finding cumulative sum we’ll
do the predictions and we’ll and we’ll also calculate the exponent of it so as
to get your data in your original format now after that we just plot the actual
values to how our model has fitted so now let me just run this so you can see
that the orange line is basically the model that we have fitted and here you
can see at only the magnitude is varying whereas the shape has been properly
captured by the Rema model now how we can do predictions guys now there is a
function in Python that is predict now before predicting the values let me
first see is my data that how many rows are there in Benares a so this is my
data set name so now let me just run this so here we have the data set from
1949 we have the number of passengers it will go on to 1960 and we have 144 rows
into one column so here we got to know that we have 144 rows so what if I want
to predict it for next ten years so what will be my prediction now here you have
to see that how many number of data points would you want so let’s see if
you want to grid it for ten years so the number of data points would be 120 that
is 12 into pen so here if you want to predict it for 10 years you have 120 so
using that plot dot predict function I can actually predict the future so here
using this function I’ll give the first index of the time sees and then the
number of data points you want the time series flow so
I have 144 rows plus 120 because I wanted for 10 years so 144 plus 120 is
equal to 264 so I’ll write it over here now let me just comment this for now and
let me just run it so over here if you can see it my blue
is the forecasted value and this gray part is your confidence level so now
whatever happens or however you do the forecasting this value will not exceed
the confidence level so this is how you can see that for the next ten years you
have the prediction somewhat like this so this is how you can do prediction and
if you don’t want to see the graph you can actually write in the data point so
here I want the prediction for ten years so I have just type in the steps that is
equals to 120 and you get the result in an array format so that is how you can
perform a lot of operations with this data and predict it for let’s say six
months 12 months next year 10 years and it’s totally up to you guys whatever
topics that I’ve covered I hope these are clear to you so now let me just go
back to a presentation and let’s see what all we are left with so here we
have just build a model wearing we have forecasted the demand for the next 10
years so in a data set we have the date in the monthly basis and we have the
number of passengers so that’s all for today guys now let me just recap what
all we have covered till now so we have started off by discussing what exactly’s
time series and we’ve also gone through the various components that are trend
seasonality cyclic and irregularity then we have understood what is stationarity
and one of the different tests to check the stationerity of the data then we
discussed one of the best models which is used in the time series analysis that
is the ARIMA model so here we have understood that ie my model is a
combination of three models that is the AI model which stands for auto
regression we have MA for moving average and i’s for the integration part and
then we have implemented all these things and we have forecasted the data
for the next ten years so I hope you guys are clear with whatever concepts
that have taught in the session so do you guys have any questions or any
doubts with respect to any other topics that I have discussed till now all right
so I don’t see any doubts over here all right no problem guys this takes time so
just go home just practice just go through the code again practice as much
as you can and in case you have any doubt or any error you can always come
back to me or you can simply ask me in my next session I hope you guys found
the session informative well thank you so much bye-bye
I hope you have enjoyed listening to this video please be kind enough to like
it and you can comment any of your doubts and queries and
we will reply them at the earliest do look out for more videos in our playlist
and subscribe to Eddie Rica channel to learn more happy learning

About the author

Comments

  1. Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Machine Learning Course curriculum, Visit our Website: http://bit.ly/2FBUtO7

  2. Hello, Video is really awesome. Can you please provide me with data and ipython notebook. It will be really helpful for me.

  3. i want to ask that the final prediction is coming in the form of log scale if we want that in exact number then what to do?

  4. Excellent!!
    Got complete knowledge on TimeSeries concept, Edureka has cleared all my insights of TimeSeries. Thank you for having such a Great Lecture!!!

  5. The data in which I am working on is yearly data. How can I do the prediction in python. Which method can I use?

    Please help 🙁

  6. Can the P and Q values be determined by checking the spikes in ACF and PACF plots or the method explained in the video is the only one?

  7. Thanks for amazing video , it would be a great help if you could share the csv file of the datas that you used. Thank you

  8. Great Video ! it would be a great help if you could share the csv file of above data that you used. Thank In Advance !

  9. Really amazing video..everything explained in the simplest possible way
    I'm getting an error in passengers traffic in airlines when I'm indexing the dataset. It is showing 'method' object is not subscriptable.
    Also can you please share the code

  10. Hello the Video is quite easy to understand. Can you please share me the link of the csv file used so that i can too get the hands on experience on the ARIMA model.

  11. Lovely video…will check out other videos now too….
    Meanwhile can you please share the CSV file and the Jupyter notebook with the code?
    And how can I send email I'd with you since this comment section is public.

  12. Hello mam very good tutorial ,

    But I have one query — I want to develop ML model for Aircraft Trajectory Prediction based on Latitude,Longitude,Altitude and some weather condition, but how to do that. In your example only two variables was there but In my case more than five variables are there. pls suggests me how to proceed. thanks

Leave a Reply

Your email address will not be published. Required fields are marked *