Conversation
| pct_aa_cases = to_percentage(aa_cases, known_cases) | ||
| pct_aa_deaths = to_percentage(aa_deaths, known_deaths) | ||
|
|
||
| return self._make_dataframe( |
There was a problem hiding this comment.
I think I'm not a huge fan of the interface here. Before, ._scrape was expected to return a list of pd.Series but now can return a list of series as well as a DataFrame. That being said, maybe this is something we wanna just refactor later instead of now.
I understand that depending on the data (a single row or multiple rows) we ultimately want to massage it differently such that a DataFrame is the final result.
I'm wondering if we can maybe hide the details of this through a class (or something else)
in scraper.py:
class ScrapedData(object):
def __init__(self, *args, **kwargs):
# store variables
self.series =None
@classmethod
def from_pd_series(cls, series):
scrapped_data = cls()
scrapped_data.series = series
return scrapped_data
@classmethod
def handle_error(self, date, msg):
scrapped_data = cls(date=date, status=msg)
return scrapped_data
def make_dataframe(self, *args, **kwargs):
if self.series:
return pd.DataFrame(self.series)
# otherwise, regular lists of data was just passed in.
# .as_list the arguments, then use those values to create a dataframe and return, like in ScraperBase._make_dataframe()
in ScaperBase.run():
try:
scrapped_data = self._scrape(start_date=start_date, end_date=end_date, **kwargs)
except Exception as e:
scrapped_data = ScrappedData.handle_error(date=date, exception=e)
ret = scrapped_data.make_dataframe()
in the scrapers:
return ScrapedData(dates=dates, cases=cases, deaths=deaths, aa_cases=aa_cases, ...)
or
return ScrapedData.from_pd_series(series=series)
There was a problem hiding this comment.
Yeah, definitely needs some clean up. The error handling esp. makes me uncomfortable.
The right abstraction should probably use the same verbs for CT and NYC cases, which get all the data at once and fetch data for each separate day, resp. Will poke at it a bit more!
(Though _scrape must return a list (possibly empty) of pandas Series objects, or a DataFrame. has always been the _scrape return type (docstring).)
There was a problem hiding this comment.
oo yeah, totally missed the docstring, my bad!
| if self.github_access_token: | ||
| _logger.debug('Using access token for Github API') | ||
| github = Github(self.github_access_token) | ||
| else: | ||
| _logger.warn('Using unauthenticated for Github API: ' | ||
| 'be careful of hitting the rate limit') | ||
| github = Github() |
There was a problem hiding this comment.
maybe useful to hide this behind a util?
fredtsun
left a comment
There was a problem hiding this comment.
lgtm! I think there are some implementation details that I have comments on, but not pressing!
e657f35 to
2982886
Compare
2982886 to
ce2a694
Compare
ce2a694 to
e4fc3d1
Compare
NYC:
CT: