Re: Predicting the Tour Rotation
Posted: Thu Mar 22, 2018 9:39 am
What podcast is this?
The biggest forum about Weird Al Yankovic
https://www.weirdalforum.com/
http://cringepodcast.com/ep-32-jim-kimo ... tz-returns" onclick="window.open(this.href);return false;TMBJon wrote:What podcast is this?
I mean, I know Al's career usually outlasts those who he parodies, but don't you think 2109 is stretching it just a bit?richegreen wrote:2109 would be the first traditional "THE SHOW" tour where Al doesn't have a new album to promote.
Edit: Well, they just said this...After some basic guessing, I was able to correctly predict 41/57 songs from the most recent location using the previous 2 locations as predictor variables, using Solver. Each song was assumed to be independent, and I didn't use the last location in calculating coefficients.
https://docs.google.com/spreadsheets/d/ ... sp=sharing" onclick="window.open(this.href);return false;
Sheet 1 just changed your Xs to 1s. I left the rows of 0 in there too. Column A is your column E (first location), and similarly for the rest in that sheet. I just copy-pasted your data then replaced x with 1.
So the assumption I'm making is that none of the previous concerts matter in predicting the next concert except for the previous 2. I then ask: given data about the previous 2 concerts, what are the coefficients I can multiply the 1s and 0s by to best predict the next concert to be a 1 or a 0?
Sheet 3 is the analysis: columns A and B are coefficients. For example, the second row says that 0.157 * (2 concerts ago) + 0.490 * (1 concert ago) is the best** prediction for the next concert. The **best coefficients for each row are found by using all of the row's data (excluding the last date, for reason's I'll explain in a second), then squaring the difference between the predicted value and the actual value. These values are stored in columns E thru U. I then sum all those values in the row in column AA, then try to minimize it using Solver.
After I get my coefficients from solver in columns A and B, I use them and the actual data from concerts 18 and 19 to predict concert 20, shown in column V. I then round that to 0 or 1 in column W, and finally show the actual value of concert 20 in column X. Column Y shows whether or not the prediction was correct.
The reason you don't use the last data point in the Solver to determine coefficients is known as in-sample and out-of-sample forecasting, so that the out-of-sample data can be correctly used to determine how good your model extrapolates data, which is ultimately what we're trying to do for the next concert in the future.
Wait ignore the results, I have to fix something. It could just be a bunch of luck haha
but then...Ok so I made a slight adjustment: before I assumed he would play if the predicted percent was >=50%. Unfortunately that made it predict he would only play 5 songs, but it was still correct 37/53 times. I changed it to predict he will play the top 19 percentages, since he seems to play 18-20 songs each time. After doing so, the number of correct guesses was 34/53, and the number of correct songs guessed was 9/18, which I think is a pretty good guess seeing as he could've played any of the 53 songs.
Using similar calculations, the most likely list of songs predicted to play next is, in no particular order:
CNR
Jackson Park Express
Good Old Days
You Don't Love Me Anymore
Don't Download This Song
When I Was Your Age
Your Horoscope For Today
Dare To Be Stupid (Unplugged)
Albuquerque
Nature Trail To Hell
Close But No Cigar
UHF
Young Dumb and Ugly
Dog Eat Dog
Christmas At Ground Zero
The Saga Begins
Unplugged Medley
drum solo(s)
bass solo(s)
Ok so I did a statistical test: the probability of randomly guessing 19 songs and matching at least 9 of his songs is about 11%. This means that if we were to repeat this procedure many times with different data sets, we would do no better than random guessing 11% of the time. This sounds good, but usually we want that number to be 5% or less for it to be called 'statistically significant'. What this means is that you shouldn't put any money on these results