Skip to content
This repository was archived by the owner on Oct 16, 2023. It is now read-only.

Added parallel code for chatglm-6B#225

Open
Caesar1993 wants to merge 10 commits into
hpcaitech:mainfrom
Caesar1993:main
Open

Added parallel code for chatglm-6B#225
Caesar1993 wants to merge 10 commits into
hpcaitech:mainfrom
Caesar1993:main

Conversation

@Caesar1993

@Caesar1993 Caesar1993 commented Sep 12, 2023

Copy link
Copy Markdown

Added parallel code for chatglm-6B.
Due to the small number of parameters, the inference speed is not as fast as single card loading, but it can be referenced in GLM models with larger parameter quantities for inference.

  1. Split the mixed qkv vectors in chatglm on the huggingface into multiple heads, then take out the qkv of each head, and finally concatenate them into a whole qkv
  2. Write the layer definition of chatglm into init, and rebuild the forward function according to the basic layer in Colossalai

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant