Skip to content

NYC & CT multi-day#83

Open
nkrishnaswami wants to merge 3 commits intomasterfrom
nyc_ct_tweaks
Open

NYC & CT multi-day#83
nkrishnaswami wants to merge 3 commits intomasterfrom
nyc_ct_tweaks

Conversation

@nkrishnaswami
Copy link
Copy Markdown
Collaborator

@nkrishnaswami nkrishnaswami commented Jul 20, 2020

NYC:

  • updated to be a multiday scraper
    CT:
  • updated to be a multiday scraper (via switch to pandas)
  • make denominator exclude unknown race

@nkrishnaswami nkrishnaswami requested a review from fredtsun July 20, 2020 03:20
@nkrishnaswami nkrishnaswami changed the title NYC & CT tweaks NYC & CT multi-day Jul 20, 2020
pct_aa_cases = to_percentage(aa_cases, known_cases)
pct_aa_deaths = to_percentage(aa_deaths, known_deaths)

return self._make_dataframe(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not a huge fan of the interface here. Before, ._scrape was expected to return a list of pd.Series but now can return a list of series as well as a DataFrame. That being said, maybe this is something we wanna just refactor later instead of now.

I understand that depending on the data (a single row or multiple rows) we ultimately want to massage it differently such that a DataFrame is the final result.

I'm wondering if we can maybe hide the details of this through a class (or something else)

in scraper.py:

class ScrapedData(object):
  def __init__(self,  *args, **kwargs):
      # store variables
      self.series =None

 @classmethod
 def from_pd_series(cls, series):
    scrapped_data = cls()
    scrapped_data.series = series
    return scrapped_data

@classmethod
def handle_error(self, date, msg):
    scrapped_data = cls(date=date, status=msg)
    return scrapped_data
   
  def make_dataframe(self, *args, **kwargs):
      if self.series:
         return pd.DataFrame(self.series)
       # otherwise, regular lists of data was just passed in.
       # .as_list the arguments, then use those values to create a dataframe and return, like in ScraperBase._make_dataframe()

in ScaperBase.run():

 try:
   scrapped_data = self._scrape(start_date=start_date, end_date=end_date, **kwargs)
 except Exception as e:
   scrapped_data = ScrappedData.handle_error(date=date, exception=e)
 ret = scrapped_data.make_dataframe()

in the scrapers:

return ScrapedData(dates=dates, cases=cases, deaths=deaths, aa_cases=aa_cases, ...)

or

return ScrapedData.from_pd_series(series=series)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely needs some clean up. The error handling esp. makes me uncomfortable.
The right abstraction should probably use the same verbs for CT and NYC cases, which get all the data at once and fetch data for each separate day, resp. Will poke at it a bit more!

(Though _scrape must return a list (possibly empty) of pandas Series objects, or a DataFrame. has always been the _scrape return type (docstring).)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo yeah, totally missed the docstring, my bad!

Comment on lines +145 to +151
if self.github_access_token:
_logger.debug('Using access token for Github API')
github = Github(self.github_access_token)
else:
_logger.warn('Using unauthenticated for Github API: '
'be careful of hitting the rate limit')
github = Github()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe useful to hide this behind a util?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Collaborator

@fredtsun fredtsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! I think there are some implementation details that I have comments on, but not pressing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants