NYC & CT multi-day by nkrishnaswami · Pull Request #83 · d4bl/COVID19_tracker_data_extraction

nkrishnaswami · 2020-07-20T03:18:43Z

NYC:

updated to be a multiday scraper
CT:
updated to be a multiday scraper (via switch to pandas)
make denominator exclude unknown race

fredtsun · 2020-07-20T21:19:58Z

+        pct_aa_cases = to_percentage(aa_cases, known_cases)
+        pct_aa_deaths = to_percentage(aa_deaths, known_deaths)
+
+        return self._make_dataframe(


I think I'm not a huge fan of the interface here. Before, ._scrape was expected to return a list of pd.Series but now can return a list of series as well as a DataFrame. That being said, maybe this is something we wanna just refactor later instead of now.

I understand that depending on the data (a single row or multiple rows) we ultimately want to massage it differently such that a DataFrame is the final result.

I'm wondering if we can maybe hide the details of this through a class (or something else)

in scraper.py:

class ScrapedData(object): def __init__(self, *args, **kwargs): # store variables self.series =None @classmethod def from_pd_series(cls, series): scrapped_data = cls() scrapped_data.series = series return scrapped_data @classmethod def handle_error(self, date, msg): scrapped_data = cls(date=date, status=msg) return scrapped_data def make_dataframe(self, *args, **kwargs): if self.series: return pd.DataFrame(self.series) # otherwise, regular lists of data was just passed in. # .as_list the arguments, then use those values to create a dataframe and return, like in ScraperBase._make_dataframe()

in ScaperBase.run():

try: scrapped_data = self._scrape(start_date=start_date, end_date=end_date, **kwargs) except Exception as e: scrapped_data = ScrappedData.handle_error(date=date, exception=e) ret = scrapped_data.make_dataframe()

in the scrapers:

return ScrapedData(dates=dates, cases=cases, deaths=deaths, aa_cases=aa_cases, ...)

or

return ScrapedData.from_pd_series(series=series)

Yeah, definitely needs some clean up. The error handling esp. makes me uncomfortable.
The right abstraction should probably use the same verbs for CT and NYC cases, which get all the data at once and fetch data for each separate day, resp. Will poke at it a bit more!

(Though _scrape must return a list (possibly empty) of pandas Series objects, or a DataFrame. has always been the _scrape return type (docstring).)

oo yeah, totally missed the docstring, my bad!

fredtsun · 2020-07-20T21:23:56Z

+        if self.github_access_token:
+            _logger.debug('Using access token for Github API')
+            github = Github(self.github_access_token)
+        else:
+            _logger.warn('Using unauthenticated for Github API: '
+                         'be careful of hitting the rate limit')
+            github = Github()


maybe useful to hide this behind a util?

fredtsun

lgtm! I think there are some implementation details that I have comments on, but not pressing!

nkrishnaswami requested a review from fredtsun July 20, 2020 03:20

nkrishnaswami changed the title ~~NYC & CT tweaks~~ NYC & CT multi-day Jul 20, 2020

fredtsun reviewed Jul 20, 2020

View reviewed changes

fredtsun approved these changes Jul 20, 2020

View reviewed changes

nkrishnaswami added 3 commits July 25, 2020 20:43

Switch CT to exclude unknown race from pct denominators, also pandas

f819982

Make NYC scraper multi-day

d7a49bf

Make CT scraper multiday

e4fc3d1

nkrishnaswami force-pushed the nyc_ct_tweaks branch from e657f35 to 2982886 Compare July 26, 2020 00:43

nkrishnaswami force-pushed the nyc_ct_tweaks branch from 2982886 to ce2a694 Compare September 29, 2020 14:39

nkrishnaswami force-pushed the nyc_ct_tweaks branch from ce2a694 to e4fc3d1 Compare October 23, 2020 00:36

nkrishnaswami force-pushed the master branch from d887d79 to db281af Compare October 23, 2020 00:47

nkrishnaswami closed this Oct 28, 2020

nkrishnaswami deleted the nyc_ct_tweaks branch October 28, 2020 23:19

nkrishnaswami restored the nyc_ct_tweaks branch October 28, 2020 23:27

nkrishnaswami reopened this Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NYC & CT multi-day#83

NYC & CT multi-day#83
nkrishnaswami wants to merge 3 commits into
masterfrom
nyc_ct_tweaks

nkrishnaswami commented Jul 20, 2020 •

edited

Loading

Uh oh!

fredtsun Jul 20, 2020

Uh oh!

nkrishnaswami Jul 20, 2020

Uh oh!

fredtsun Jul 21, 2020

Uh oh!

fredtsun Jul 20, 2020

Uh oh!

nkrishnaswami Jul 20, 2020

Uh oh!

fredtsun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nkrishnaswami commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fredtsun Jul 20, 2020

Choose a reason for hiding this comment

Uh oh!

nkrishnaswami Jul 20, 2020

Choose a reason for hiding this comment

Uh oh!

fredtsun Jul 21, 2020

Choose a reason for hiding this comment

Uh oh!

fredtsun Jul 20, 2020

Choose a reason for hiding this comment

Uh oh!

nkrishnaswami Jul 20, 2020

Choose a reason for hiding this comment

Uh oh!

fredtsun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nkrishnaswami commented Jul 20, 2020 •

edited

Loading