Skip to content

Commit 78da9fa

Browse files
Source NYT (#259)
* Started adding documentation for NYT source * Started nyt.py file in services.location module * Testing nischal pushing to branch * Temp directory to test parsing nyt timeseries * Locally working nyt source on v2 * Deleted temporary folder so it hopefully passes build * Added mock data, unit tests, and regression tests for source NYT * Updated README to reflect new NYT source * Fixed requested points from PR Co-authored-by: nischalshankar <[email protected]>
1 parent 17eacea commit 78da9fa

File tree

9 files changed

+569
-7
lines changed

9 files changed

+569
-7
lines changed

README.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,24 @@ Support multiple data-sources.
2424
![Covid-19 Recovered](https://covid19-badges.herokuapp.com/recovered/latest)
2525
![Covid-19 Deaths](https://covid19-badges.herokuapp.com/deaths/latest)
2626

27+
## New York Times is now available as a source!
28+
29+
**Specify source parameter with ?source=nyt. NYT also provides a timeseries! To view timelines of cases by US counties use ?source=nyt&timelines=true**
30+
2731
## Recovered cases showing 0
2832

29-
**JHU (our main data provider) [no longer provides data for amount of recoveries](https://github.com/CSSEGISandData/COVID-19/issues/1250), and as a result, the API will be showing 0 for this statistic. Apolegies for any inconvenience. Hopefully we'll be able to find an alternative data-source that offers this.**
33+
**JHU (our main data provider) [no longer provides data for amount of recoveries](https://github.com/CSSEGISandData/COVID-19/issues/1250), and as a result, the API will be showing 0 for this statistic. Apologies for any inconvenience. Hopefully we'll be able to find an alternative data-source that offers this.**
3034

3135
## Available data-sources:
3236

33-
Currently 2 different data-sources are available to retrieve the data:
37+
Currently 3 different data-sources are available to retrieve the data:
3438

3539
* **jhu** - https://github.com/CSSEGISandData/COVID-19 - Worldwide Data repository operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
3640

3741
* **csbs** - https://www.csbs.org/information-covid-19-coronavirus - U.S. County data that comes from the Conference of State Bank Supervisors.
3842

43+
* **nyt** - https://github.com/nytimes/covid-19-data - The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time.
44+
3945
__jhu__ data-source will be used as a default source if you don't specify a *source parameter* in your request.
4046

4147
## API Reference
@@ -71,7 +77,8 @@ __Sample response__
7177
{
7278
"sources": [
7379
"jhu",
74-
"csbs"
80+
"csbs",
81+
"nyt"
7582
]
7683
}
7784
```
@@ -87,7 +94,7 @@ GET /v2/latest
8794
__Query String Parameters__
8895
| __Query string parameter__ | __Description__ | __Type__ |
8996
| -------------------------- | -------------------------------------------------------------------------------- | -------- |
90-
| source | The data-source where data will be retrieved from *(jhu/csbs)*. Default is *jhu* | String |
97+
| source | The data-source where data will be retrieved from *(jhu/csbs/nyt)*. Default is *jhu* | String |
9198

9299
__Sample response__
93100
```json
@@ -117,7 +124,7 @@ __Path Parameters__
117124
__Query String Parameters__
118125
| __Query string parameter__ | __Description__ | __Type__ |
119126
| -------------------------- | -------------------------------------------------------------------------------- | -------- |
120-
| source | The data-source where data will be retrieved from *(jhu/csbs)*. Default is *jhu* | String |
127+
| source | The data-source where data will be retrieved from *(jhu/csbs/nyt)*. Default is *jhu* | String |
121128

122129
#### Example Request
123130
```http
@@ -160,7 +167,7 @@ GET /v2/locations
160167
__Query String Parameters__
161168
| __Query string parameter__ | __Description__ | __Type__ |
162169
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
163-
| source | The data-source where data will be retrieved from.<br>__Value__ can be: *jhu/csbs*. __Default__ is *jhu* | String |
170+
| source | The data-source where data will be retrieved from.<br>__Value__ can be: *jhu/csbs/nyt*. __Default__ is *jhu* | String |
164171
| country_code | The ISO ([alpha-2 country_code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)) to the Country/Province for which you're calling the Endpoint | String |
165172
| timelines | To set the visibility of timelines (*daily tracking*).<br>__Value__ can be: *0/1*. __Default__ is *0* (timelines are not visible) | Integer |
166173

app/data/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
"""app.data"""
22
from ..services.location.csbs import CSBSLocationService
33
from ..services.location.jhu import JhuLocationService
4+
from ..services.location.nyt import NYTLocationService
45

56
# Mapping of services to data-sources.
6-
DATA_SOURCES = {"jhu": JhuLocationService(), "csbs": CSBSLocationService()}
7+
DATA_SOURCES = {"jhu": JhuLocationService(), "csbs": CSBSLocationService(), "nyt": NYTLocationService()}
78

89

910
def data_source(source):

app/enums/sources.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ class Sources(str, Enum):
88

99
jhu = "jhu"
1010
csbs = "csbs"
11+
nyt = "nyt"

app/location/nyt.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
"""app.locations.nyt.py"""
2+
from . import TimelinedLocation
3+
4+
5+
class NYTLocation(TimelinedLocation):
6+
"""
7+
A NYT (county) Timelinedlocation.
8+
"""
9+
10+
# pylint: disable=too-many-arguments,redefined-builtin
11+
def __init__(self, id, state, county, coordinates, last_updated, timelines):
12+
super().__init__(id, "US", state, coordinates, last_updated, timelines)
13+
14+
self.state = state
15+
self.county = county
16+
17+
def serialize(self, timelines=False): # pylint: disable=arguments-differ,unused-argument
18+
"""
19+
Serializes the location into a dict.
20+
21+
:returns: The serialized location.
22+
:rtype: dict
23+
"""
24+
serialized = super().serialize(timelines)
25+
26+
# Update with new fields.
27+
serialized.update(
28+
{"state": self.state, "county": self.county,}
29+
)
30+
31+
# Return the serialized location.
32+
return serialized

app/services/location/nyt.py

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
"""app.services.location.nyt.py"""
2+
import csv
3+
from datetime import datetime
4+
5+
from asyncache import cached
6+
from cachetools import TTLCache
7+
8+
from ...coordinates import Coordinates
9+
from ...location.nyt import NYTLocation
10+
from ...timeline import Timeline
11+
from ...utils import httputils
12+
from . import LocationService
13+
14+
15+
class NYTLocationService(LocationService):
16+
"""
17+
Service for retrieving locations from New York Times (https://github.com/nytimes/covid-19-data).
18+
"""
19+
20+
async def get_all(self):
21+
# Get the locations.
22+
locations = await get_locations()
23+
return locations
24+
25+
async def get(self, loc_id): # pylint: disable=arguments-differ
26+
# Get location at the index equal to provided id.
27+
locations = await self.get_all()
28+
return locations[loc_id]
29+
30+
31+
# ---------------------------------------------------------------
32+
33+
34+
# Base URL for fetching category.
35+
BASE_URL = "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv"
36+
37+
38+
def get_grouped_locations_dict(data):
39+
"""
40+
Helper function to group history for locations into one dict.
41+
42+
:returns: The complete data for each unique US county
43+
:rdata: dict
44+
"""
45+
grouped_locations = {}
46+
47+
# in increasing order of dates
48+
for row in data:
49+
county_state = (row["county"], row["state"])
50+
date = row["date"]
51+
confirmed = row["cases"]
52+
deaths = row["deaths"]
53+
54+
# initialize if not existing
55+
if county_state not in grouped_locations:
56+
grouped_locations[county_state] = {"confirmed": [], "deaths": []}
57+
58+
# append confirmed tuple to county_state (date, # confirmed)
59+
grouped_locations[county_state]["confirmed"].append((date, confirmed))
60+
# append deaths tuple to county_state (date, # deaths)
61+
grouped_locations[county_state]["deaths"].append((date, deaths))
62+
63+
return grouped_locations
64+
65+
66+
@cached(cache=TTLCache(maxsize=1024, ttl=3600))
67+
async def get_locations():
68+
"""
69+
Returns a list containing parsed NYT data by US county. The data is cached for 1 hour.
70+
71+
:returns: The complete data for US Counties.
72+
:rtype: dict
73+
"""
74+
75+
# Request the data.
76+
async with httputils.CLIENT_SESSION.get(BASE_URL) as response:
77+
text = await response.text()
78+
79+
# Parse the CSV.
80+
data = list(csv.DictReader(text.splitlines()))
81+
82+
# Group together locations (NYT data ordered by dates not location).
83+
grouped_locations = get_grouped_locations_dict(data)
84+
85+
# The normalized locations.
86+
locations = []
87+
88+
for idx, (county_state, histories) in enumerate(grouped_locations.items()):
89+
# Make location history for confirmed and deaths from dates.
90+
# List is tuples of (date, amount) in order of increasing dates.
91+
confirmed_list = histories["confirmed"]
92+
confirmed_history = {date: int(amount or 0) for date, amount in confirmed_list}
93+
94+
deaths_list = histories["deaths"]
95+
deaths_history = {date: int(amount or 0) for date, amount in deaths_list}
96+
97+
# Normalize the item and append to locations.
98+
locations.append(
99+
NYTLocation(
100+
id=idx,
101+
state=county_state[1],
102+
county=county_state[0],
103+
coordinates=Coordinates(None, None), # NYT does not provide coordinates
104+
last_updated=datetime.utcnow().isoformat() + "Z", # since last request
105+
timelines={
106+
"confirmed": Timeline(
107+
{
108+
datetime.strptime(date, "%Y-%m-%d").isoformat() + "Z": amount
109+
for date, amount in confirmed_history.items()
110+
}
111+
),
112+
"deaths": Timeline(
113+
{
114+
datetime.strptime(date, "%Y-%m-%d").isoformat() + "Z": amount
115+
for date, amount in deaths_history.items()
116+
}
117+
),
118+
"recovered": Timeline({}),
119+
},
120+
)
121+
)
122+
123+
return locations

tests/example_data/counties.csv

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
date,county,state,fips,cases,deaths
2+
2020-01-21,Snohomish,Washington,53061,1,0
3+
2020-01-22,Snohomish,Washington,53061,1,0
4+
2020-01-23,Snohomish,Washington,53061,1,0
5+
2020-01-24,Cook,Illinois,17031,1,0
6+
2020-01-24,Snohomish,Washington,53061,1,0
7+
2020-01-25,Orange,California,06059,1,0
8+
2020-01-25,Cook,Illinois,17031,1,0
9+
2020-01-25,Snohomish,Washington,53061,1,0
10+
2020-01-26,Maricopa,Arizona,04013,1,0
11+
2020-01-26,Los Angeles,California,06037,1,0
12+
2020-01-26,Orange,California,06059,1,0
13+
2020-01-26,Cook,Illinois,17031,1,0
14+
2020-01-26,Snohomish,Washington,53061,1,0
15+
2020-01-27,Maricopa,Arizona,04013,1,0
16+
2020-01-27,Los Angeles,California,06037,1,0
17+
2020-01-27,Orange,California,06059,1,0
18+
2020-01-27,Cook,Illinois,17031,1,0
19+
2020-01-27,Snohomish,Washington,53061,1,0
20+
2020-01-28,Maricopa,Arizona,04013,1,0
21+
2020-01-28,Los Angeles,California,06037,1,0
22+
2020-01-28,Orange,California,06059,1,0
23+
2020-01-28,Cook,Illinois,17031,1,0
24+
2020-01-28,Snohomish,Washington,53061,1,0
25+
2020-01-29,Maricopa,Arizona,04013,1,0
26+
2020-01-29,Los Angeles,California,06037,1,0
27+
2020-01-29,Orange,California,06059,1,0
28+
2020-01-29,Cook,Illinois,17031,1,0
29+
2020-01-29,Snohomish,Washington,53061,1,0
30+
2020-01-30,Maricopa,Arizona,04013,1,0
31+
2020-01-30,Los Angeles,California,06037,1,0
32+
2020-01-30,Orange,California,06059,1,0
33+
2020-01-30,Cook,Illinois,17031,2,0
34+
2020-01-30,Snohomish,Washington,53061,1,0
35+
2020-01-31,Maricopa,Arizona,04013,1,0
36+
2020-01-31,Los Angeles,California,06037,1,0
37+
2020-01-31,Orange,California,06059,1,0
38+
2020-01-31,Santa Clara,California,06085,1,0
39+
2020-01-31,Cook,Illinois,17031,2,0
40+
2020-01-31,Snohomish,Washington,53061,1,0
41+
2020-02-28,Snohomish,Washington,53061,2,0
42+
2020-03-10,Snohomish,Washington,53061,61,0
43+
2020-03-11,Snohomish,Washington,53061,69,1
44+
2020-03-12,Snohomish,Washington,53061,107,3
45+
2020-03-15,Snohomish,Washington,53061,175,3
46+
2020-03-17,Snohomish,Washington,53061,265,4
47+
2020-03-18,Snohomish,Washington,53061,309,5
48+
2020-03-19,Snohomish,Washington,53061,347,6
49+
2020-03-20,Snohomish,Washington,53061,384,7

0 commit comments

Comments
 (0)