Skip to content

Commit 57da7c9

Browse files
committed
add Julia example and update README
1 parent 224bee8 commit 57da7c9

File tree

10 files changed

+38
-69
lines changed

10 files changed

+38
-69
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/.history/
1+
/**/.history/
22
/.vscode/
33
/**/.DS_Store
44

@@ -12,3 +12,4 @@
1212
/**/main.dwarf
1313
/**/main
1414
/**/target
15+
/**/.idea

README.md

Lines changed: 24 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,26 @@
11
# words_extractor
22

3-
| | | | | |
4-
|--- |--- |--- |--- |--- |
5-
| | | | | |
6-
| | | | | |
7-
| | | | | |
8-
9-
Example of a text file parsing in several programming languages
10-
11-
MacOS 12.2
12-
Rust 1.58.1
13-
MBP 16" 64GB 2TB M1Max 10 cores
14-
Tested on 123 files (504MB)
15-
16-
Results:
17-
18-
1. Rust with waitgroup 1.58.1 -> 0.3521 s
19-
2. Ruby 3.1 with Parallel -> 2.0542 s
20-
3. Python 3.10.2 with multiprocessing -> 2.9403 s
21-
4. Crystal 1.3.2 with channels -> 6.0035 s
22-
5. Go 1.18beta1 with waitgroup -> 7.2166 s
23-
24-
For more files (just got into data/pl/* and clone files several times)
25-
26-
Total files: 936
27-
Total size: 2745 MB
28-
29-
1. Rust: 7.5436 s
30-
2. Python: 15.3445 s
31-
3. Crystal: 26.3150 s
32-
4. Ruby: 40.9438 s
33-
5. Golang: 1m13 s
34-
6. Elixir: 2m43 s
3+
### Info
4+
Example of a text file parsing in several programming languages. The goal is to extract unique words from utf-8 files and save results them into separate files.
5+
6+
### Results
7+
8+
The following results are for 936 files (2745 MB) on MacOS 12.2 and MacBook Pro 16" 64GB 2TB M1Max 10 cores. (For more text files go into data/pl/* and duplicate files several times.) All examples are using a similar logic and approach.
9+
10+
<pre>
11+
1. Rust v1.58.1 = 7.54s
12+
2. Python v3.10.2 = 15.34s (with multiprocessing)
13+
3. Julia v1.7.1 = 17.00s
14+
4. Crystal v1.3.2 = 26.32s
15+
5. Ruby v3.1.0 = 40.94s (with Parallel)
16+
6. Golang v1.18beta1 = 73.00s
17+
6. Elixir v1.13.2 = 2m43s
18+
</pre>
19+
20+
### Conclusion
21+
22+
Rust is the fastest language beyond doubt.
23+
24+
What is surprised is pretty poor Golang's performance on this task. Crystal is faster than Golang but in this task, it is still slower than Python which is also surprising. (Neither Golang nor Crystal is my main field of expertise so maybe there is some room for improvement. Although I showed this code to people and nobody so far could improve it in any significant way. But if I find a better implementation I will update this comparison.)
25+
26+
The high Python performance is interesting. Although it is using a multiprocessing standard library for full CPU cores utilization this is still dynamic interpreted language after all, which is rather expected to be slower than statically typed languages.

example-golang/.tool-versions

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
golang 1.18beta1
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name = "words_extractor_jl"
1+
name = "words"
22
uuid = "ab5e5b3c-2775-42ba-a2f5-dc8ee1810597"
33
authors = ["Jaroslaw Zabiello <[email protected]>"]
44
version = "0.1.0"

example-julia/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Julia
2+
3+
## run
4+
5+
```
6+
julia -t10 src/words.jl
7+
```
8+

words_extractor_jl/src/words_extractor_jl.jl renamed to example-julia/src/words.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ function worker(yaml_path)
99
path = get_filepath(yaml_path)
1010
words = get_words(yaml_path)
1111
write(path, join(words, "\n"))
12-
println(string("Saved...", path))
12+
# println(string("Saved...", path))
1313
end
1414

1515
function get_words(yaml_path)
@@ -42,7 +42,7 @@ function main()
4242
end
4343
mkdir(folder)
4444
Threads.@threads for path in walk("../data/pl/", ".yml")
45-
println("Spawn $path")
45+
# println("Spawn $path")
4646
worker(path)
4747
end
4848
end

words_extractor_ex/mix.exs

Lines changed: 0 additions & 29 deletions
This file was deleted.

words_extractor_ex/mix.lock

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)