Hacker News new | past | comments | ask | show | jobs | submit login

You might want to tune the sampler. For example, set it to a lower temperature. Also, the 4bit RTN quantisation seems to be messing up the model. Perhaps, the GPTQ quantisation will be much better.



Use `--top_p 2 --top_k 40 --repeat_penalty 1.176 --temp 0.7` with llama.cpp


Not bad with these settings:

    ./main -m ./models/7B/ggml-model-q4_0.bin \
    --top_p 2 --top_k 40 \
    --repeat_penalty 1.176 \
    --temp 0.7 
    -p 'async fn download_url(url: &str)'


    async fn download_url(url: &str) -> io::Result<String> {
      let url = URL(string_value=url);
      if let Some(err) = url.verify() {} // nope, just skip the downloading part
      else match err == None {  // works now
        true => Ok(String::from(match url.open("get")?{
            |res| res.ok().expect_str(&url)?,
            |err: io::Error| Err(io::ErrorKind(uint16_t::MAX as u8))),
            false => Err(io::Error


lol,

    ./main -m ./models/7B/ggml-model-q4_0.bin \
    --top_p 2 --top_k 40 \
    --repeat_penalty 1.176 \
    --temp 0.7
    -p 'To seduce a woman, you first have to'
output:

    import numpy as np
    from scipy.linalg import norm, LinAlgError
    np.random.seed(10)
    x = -2\*norm(LinAlgError())[0]  # error message is too long for command line use
    print x [end of text]


What fork are you using?

repeat_penalty is not an option.



It's a new feature :) Pull latest from master.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: