Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
525 views
in Technique[技术] by (71.8m points)

go - How to read unicode characters from data source

Below code is able to read data source(following all reading rules), having text(with UTF-8 encodings of size one byte):

package main

import (
    "fmt"
    "io"
)

type MyStringData struct {
    str       string
    readIndex int
}

func (myStringData *MyStringData) Read(p []byte) (n int, err error) {

    // convert `str` string to slice of bytes
    strBytes := []byte(myStringData.str)

    // if `readIndex` is GTE source length, return `EOF` error
    if myStringData.readIndex >= len(strBytes) {
        return 0, io.EOF // `0` bytes read
    }

    // get next readable limit (exclusive)
    nextReadLimit := myStringData.readIndex + len(p)

    if nextReadLimit >= len(strBytes) {
        nextReadLimit = len(strBytes)
        err = io.EOF
    }

    // get next bytes to copy and set `n` to its length
    nextBytes := strBytes[myStringData.readIndex:nextReadLimit]
    n = len(nextBytes)

    // copy all bytes of `nextBytes` into `p` slice
    copy(p, nextBytes)

    // increment `readIndex` to `nextReadLimit`
    myStringData.readIndex = nextReadLimit

    // return values
    return
}

func main() {

    // create data source
    src := MyStringData{str: "Hello Amazing World!"} // 学中文

    p := make([]byte, 3) // slice of length `3`

    // read `src` until an error is returned
    for {
        // read `p` bytes from `src`
        n, err := src.Read(p)
        fmt.Printf("%d bytes read, data:%s
", n, p[:n])

        // handle error
        if err == io.EOF {
            fmt.Println("--end-of-file--")
            break
        } else if err != nil {
            fmt.Println("Oops! some error occured!", err)
            break
        }
    }
}

Output:

$
$
$ go run src/../Main.go
3 bytes read, data:Hel
3 bytes read, data:lo 
3 bytes read, data:Ama
3 bytes read, data:zin
3 bytes read, data:g W
3 bytes read, data:orl
2 bytes read, data:d!
--end-of-file--
$
$

But the above code is unable to read data source having text(with UTF-8 encodings of size greater than one byte) as shown below:

  src := MyStringData{str: "Hello Amazing World!学中文"} 

Below is the output:

$
$
$ go run src/../Main.go
3 bytes read, data:Hel
3 bytes read, data:lo 
3 bytes read, data:Ama
3 bytes read, data:zin
3 bytes read, data:g W
3 bytes read, data:orl
3 bytes read, data:d!?
3 bytes read, data:???
3 bytes read, data:???
2 bytes read, data:??
--end-of-file--
$
$

Edit:

With the comments given on usage of strings.NewReader(), below is the code modified:

// create data source
src := strings.NewReader("Hello Amazing World!学中文") // 学中文

// p := make([]byte, 3) // slice of length `3`

// read `src` until an error is returned
for {
    // read `p` bytes from `src`
    ch, n, err := src.ReadRune()
    // n, err := src.Read(p)
    fmt.Printf("%d bytes read, data:%c
", n, ch)

    // handle error
    if err == io.EOF {
        fmt.Println("--end-of-file--")
        break
    } else if err != nil {
        fmt.Println("Oops! some error occured!", err)
        break
    }
}

How to read unicode characters without splitting a character(say ) in two Read calls?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...