Skip to content

Commit 72abf8b

Browse files
committed
update README
1 parent a407424 commit 72abf8b

File tree

3 files changed

+16
-13
lines changed

3 files changed

+16
-13
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,5 @@
1212
/**/main.dwarf
1313
/**/main
1414
/**/target
15-
/**/.idea
15+
/**/.idea
16+
/**/.venv

README.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,29 @@
11
# words_extractor
22

33
### Info
4+
45
Example of a text file parsing in several programming languages. The goal is to extract unique words from utf-8 files and save results them into separate files.
56

67
### Results
78

8-
The following results are for 936 files (2745 MB) on MacOS 12.2 and MacBook Pro 16" 64GB 2TB M1Max 10 cores. (For more text files go into data/pl/* and duplicate files several times.) All examples are using a similar logic and approach.
9+
The following results are for 123 unique utf-8 Bible text files in 23 languages (used at mybible.pl site) They take 504MB. (The repo contains only a few sample files in the 'data' folder. For testing more data you could multiple files by cloning *.txt (and the associated*.yml) file under different names)
10+
11+
* Platform: MacOS 12.2
12+
* Machine: MacBook Pro 16" 64GB 2TB M1Max 10 cores.
913

1014
<pre>
11-
1. Rust v1.58.1 = 7.54s
12-
2. Python v3.10.2 = 15.34s (with multiprocessing)
13-
3. Julia v1.7.1 = 17.00s
14-
4. Crystal v1.3.2 = 26.32s
15-
5. Ruby v3.1.0 = 40.94s (with Parallel)
16-
6. Golang v1.18beta1 = 73.00s
17-
7. Elixir v1.13.2 = 2m43s
15+
1. Rust 1.58 = 0.38s
16+
2. Python 3.10.2 = 2.80s
17+
3. Julia 1.7.1 = 4.522
18+
4. Crystal 1.3.2 = 5.72s
19+
5. Elixir 1.13.2 = 8.37s
20+
6. Ruby 3.1.0 = 8.31s
21+
22+
Golang 1.17 = UNDER REFACTORING, stay tuned
1823
</pre>
1924

2025
### Conclusion
2126

2227
Rust is the fastest language beyond doubt.
2328

24-
What is surprised is pretty poor Golang's performance on this task. Crystal is faster than Golang but in this task it is still slower than Python which is also surprising. (Neither Golang nor Crystal is my main field of expertise so maybe there is some room for improvement. Although I showed this code to people and nobody so far could improve it in any significant way. But if I find a better implementation I will update this comparison.)
25-
26-
The high Python performance is interesting. Although it is using a multiprocessing standard library for full CPU cores utilization this is still dynamic interpreted language after all, which is rather expected to be slower than statically typed languages.
29+
The high Python performance is interesting. Although it is using a multiprocessing standard library for full CPU cores utilization this is still dynamic interpreted language after all, which is rather expected to be slower than statically typed languages.

example-python/words.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ def worker(path: str, outdir: str, sorting: bool = False) -> Tuple[str, int]:
5151

5252
pool = mp.Pool(mp.cpu_count())
5353

54-
print("Processing")
5554
results = []
5655
paths = glob.glob(src_path, recursive=True)
5756
if not paths:

0 commit comments

Comments
 (0)