Chinese characters’ pronunciations are expressed using the Latin alphabet with accents for tone (for example tóng xué for 同学, or classmate). Therefore modern Chinese typing is done via normal Latin keyboards by typing the pronunciation (without tones because that’s a pain) and choosing between the keyboard’s different guesses for what you intended to write. For example to say “I want a cake” I’d type “woyaogedangao” and get 我要个蛋糕. This is the Pinyin system, which is used in mainland China among others. Taiwan uses a different system and I don’t know how they type with it.
Source: Am learning Chinese.
You gotta interact with that language, both in spoken and written (the former is more effective but the latter is more accessible) forms. Of course studying grammar and vocabulary is important, but it’s in the end a stepping stone so you can comprehend native content. Admittedly I have no idea if there’s even something like native Esperanto content, but yeah that’s the gist of it. It’s also best if the content you’re consuming is something you actually enjoy. I for example learned English from memes on Facebook and then Reddit, and learned Japanese from anime and light novels. Something to take into consideration is the n+1 rule, which says that when consuming content of a language you’re trying to learn, you should pick something where you can generally understand all words except one in a sentence. This allows you to use context clues to understand unknown words and makes the whole process more effective.
Also something to note is that learning two languages at once is, in my experience, not a good idea. They’ll start mixing and pronunciation rules for one will leak into another and generally cause a headache that’s probably not worth it.
PS: I keep saying consume because I’m too awkward to talk to native speakers, but that’s also a good option from what I hear.