read_jsonl.md (1811B)
1 # Working with JSON Lines Files 2 3 !!! warning "Deprecated" 4 The JSONL functions in BazerUtils (`read_jsonl`, `stream_jsonl`, `write_jsonl`) are deprecated. 5 Use [JSON.jl](https://github.com/JuliaIO/JSON.jl) v1 instead, which has native support: 6 ```julia 7 using JSON 8 data = JSON.parse("data.jsonl"; jsonlines=true) # read 9 JSON.json("out.jsonl", data; jsonlines=true) # write 10 ``` 11 12 --- 13 14 ## From the website: what is JSON Lines? 15 16 > JSON Lines (JSONL) is a convenient format for storing structured data that may be processed one record at a time. Each line is a valid JSON value, separated by a newline character. This format is ideal for large datasets and streaming applications. 17 18 For more details, see [jsonlines.org](https://jsonlines.org/). 19 20 --- 21 22 ## Legacy API (deprecated) 23 24 ### `read_jsonl` 25 26 Reads the entire file or stream into memory and returns a vector of parsed JSON values. 27 28 ```julia 29 using BazerUtils 30 data = read_jsonl("data.jsonl") 31 data = read_jsonl(IOBuffer("{\"a\": 1}\n{\"a\": 2}\n")) 32 data = read_jsonl(IOBuffer("{\"a\": 1}\n{\"a\": 2}\n"); dict_of_json=true) 33 ``` 34 35 ### `stream_jsonl` 36 37 Creates a lazy iterator (Channel) that yields one parsed JSON value at a time. 38 39 ```julia 40 for record in stream_jsonl("data.jsonl") 41 println(record) 42 end 43 first10 = collect(Iterators.take(stream_jsonl("data.jsonl"), 10)) 44 ``` 45 46 ### `write_jsonl` 47 48 Write an iterable of JSON-serializable values to a JSONL file. 49 50 ```julia 51 write_jsonl("out.jsonl", [Dict("a"=>1), Dict("b"=>2)]) 52 write_jsonl("out.jsonl.gz", (Dict("i"=>i) for i in 1:100); compress=true) 53 ``` 54 55 --- 56 57 ## See Also 58 59 - [`JSON.jl`](https://github.com/JuliaIO/JSON.jl): The recommended replacement. Use `jsonlines=true` for JSONL support. 60 - [`CodecZlib.jl`](https://github.com/JuliaIO/CodecZlib.jl): Gzip compression support.