Skip to content

Commit 6f5a7a7

Browse files
committed
update Julia version which which ignore sigla
1 parent 1cf0b9a commit 6f5a7a7

File tree

2 files changed

+12
-10
lines changed

2 files changed

+12
-10
lines changed

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ The following results are for 123 unique utf-8 Bible text files in 23 languages
1414
* Machine: MacBook Pro 16" 64GB 2TB M1Max 10 cores.
1515

1616
<pre>
17-
1. Rust 1.58 = 1.14s, with sorting: 1.59s
17+
1. Rust 1.18 = 1.14s, with sorting: 1.59s
1818
2. Golang 1.17.7 = 1.84s, with sorting: 2.16s
19-
3. Julia 1.7.1 = 4.52s
20-
4. Python 3.10.2 = 4.53s, with sorting: 4.65s
21-
5. Crystal 1.3.2 = 5.72s
22-
6. Elixir 1.13.2 = 7.82s
23-
7. Ruby 3.1.0 = 10.56s, with sorting: 10.70s
19+
3. Python 3.10.2 = 4.53s, with sorting: 4.65s
20+
4. Crystal 1.3.2 = 5.72s
21+
5. Elixir 1.13.2 = 7.82s
22+
6. Julia 1.7.1 = 12.46s, with sorting: 13.71s
23+
7. Ruby 3.1.0 = 13.00s, with sorting: 13.04s
2424
</pre>
2525

2626
### Conclusion
@@ -57,3 +57,4 @@ The new optimized Golang code version is very fast, slower than Rust but faster
5757

5858
* Added newer Golang 1.1.17 and improved code
5959
* Added fixed Python version ignoring sigla (like in Ruby version)
60+
* Added fixed Julia version ignoring sigla

example-julia/src/words.jl

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,20 @@ function worker(yaml_path, sorting, i, count)
2525
end
2626

2727
function get_words(yaml_path, sorting = false)
28-
words = []
28+
pattern = r"[\W\d]+"
29+
unique_words = Set()
2930
open(replace(yaml_path, ".yml" => ".txt")) do file
3031
for line in readlines(file)
3132
# exclude beginning book refrence from the line
3233
text = split(line, " ")[begin+2:end] |> t -> join(t, " ")
3334
tokens =
3435
text |>
3536
lowercase |>
36-
t -> split(t, r"[\W\d]+") |> t -> filter(token -> length(token) > 1, t)
37-
append!(words, tokens)
37+
t -> split(t, pattern) |> t -> filter(token -> length(token) > 1, t)
38+
union!(unique_words, tokens)
3839
end
3940
end
40-
unique_words = Set(words)
41+
# unique_words = Set(words)
4142
if sorting
4243
arr = collect(unique_words)
4344
sort(arr)

0 commit comments

Comments
 (0)