Spider for Floor Reply Lottery on V2EX
Last night, I saw Gift Book Activity on the V2EX, and the first three comrades with the same number of responses and the last number of the Shanghai Composite Index can get the gifts.
跟帖回复任意一个两位数(例如 37 )。取 2016 年 10 月 20 日当日收盘时的上证指数的十位和个位数字(比如,如果是 3789 ,那就是“ 89 ”),最接近的前三位同学,将获得《 Python Web 开发实战》一本。
which is tanslate like
Reply to any one of the two digits (for example, 37). Take the ten and one’s digits of the Shanghai Composite Index at the close of trading on October 20, 2016 (for example, if it is 3789, that is “89”), the closest top three students will get “Python Web Development” One.
But at this time there is already a reply to the 900+ floor. If the number of responses has exceeded three, it makes no sense. So I came up with the idea of writing a reptile to judge.
Effect
Below are the results sorted by a number of occurrences and by a number of numbers.
1 | Counter({66: 35, |
Result
It can be seen that only the numbers 2, 4, 6, 8, 9, 82, 84, 91, 94, 96, 97 are mentioned less than 3 times, that is, when I go to the draw at this time, choose other The numbers are meaningless.
Principle
Sweepstakes page You don’t need to log in to view it. Each comment page is in the form of get url?=number
.
Because each person’s first digital response is counted, you need to save the username and reply content of each reply to deduplicate.
Except for individual floors, the first number that appears in the response is the number the user guessed. So use the regular findall
to get the first matching number.
So the process is to first crawl all the pages with the crawler, then use XPath to filter out the username and reply content, then use the regular to match the number, and finally calculate.
Code
First, download the page to the local that easy to debug.
1 | #Download the page |
Then, extract the username and comments from the page, and do the deduplication and regular matching.
1 | import re |
Then do sorting, here are two sorts
1 | import pprint |
See here for the complete code
Conclusion
After reading the Shanghai Stock Exchange Index for the past week, I found that the stability was at 50±15, but the closest one among the remaining numbers was only 82, so I chose 82.
———————————update———————————
The Shanghai Composite Index 84, passed by…