Thoughts, tutorials, and updates from the community.
Threads are one of the harder parts of language implementation, both to use correctly and to implement correctly. Writing thread-safe code is tedious and notoriously difficult to debug: race conditions, memory leaks, and subtle undefined behavior are common. Languages with thoughtful concurrency models (Go and its goroutines are a frequent example) make this problem far more manageable.
I’ve had my share of painful bugs while implementing threading in my own language, Sunflower. Below I explain the problems I faced, the design choices I considered, and the working implementation I arrived at.
Problems I ran intoSome of the hard issues I encountered while implementing threads:
Tracking down undefined behavior such as random segmentation faults. Allocating too many OS threads and causing resource exhaustion / leaks. Freeing resources too early and producing dangling pointers. A few other embarrassments I’ll spare you the details of. What is a thread?A thread is a lightweight sequence of execution that runs concurrently with other threads and the main program. Each thread has its own stack and program counter and behaves like a small, independent program. For example, in a game one thread might listen for input, another maintain the network connection, and another render frames, they run concurrently and (on multicore hardware) possibly in parallel.
The naïve approach, and why it failsMy initial approach created one OS thread for every high-level thread requested from Sunflower’s frontend. That works for a few threads but quickly hits system limits: creating thousands of OS threads leads to segmentation faults or poor performance because the OS can’t allocate stack and scheduling resources for so many threads.
This made me study how modern languages handle high numbers of concurrent tasks.
How other languages solve it (brief)Go (and several other runtimes) uses an N:M threading model: map N logical jobs (goroutines) onto M OS threads (workers), where M is sized according to hardware concurrency. The runtime schedules many logical tasks on a smaller pool of OS threads and thus avoids the cost and limit of one-to-one OS thread allocation.
I implemented a worker pool: M long-lived OS threads (workers) that sleep when idle and wake to execute queued jobs. Workers are driven by a condition_variable. When a job is queued, the runtime wakes one worker (or more) and a free worker picks up and executes the job.
This differs from a strict Round-Robin scheduler, job allocation is effectively random among free workers, which yields a trivially distributed load across the pool.
High-level architecture A global queue of pendingThreadHandle jobs (q_jobs).
M worker threads (v_workers) that wait on a condition_variable.
v_handles stores active ThreadHandle pointers and allows reuse of slots via idx_avl.
API primitives: create, run, join, join_all, and close.
Initializer (create worker threads)
static std::vector<std::thread> v_workers;
SF_API void
init_runtime_threads ()
{
size_t n = std::thread::hardware_concurrency ();
if (!n) n = 4;
for (size_t i = 0; i < n; ++i)
v_workers.emplace_back(worker_loop);
}
v_workers holds the M workers that watch the job queue and execute work.
static void
worker_loop ()
{
while (1)
{
ThreadHandle *th = nullptr;
{
std::unique_lock<std::mutex> lock(wl_mutex);
t_cv.wait(lock, [] { return shutting_down || !q_jobs.empty(); });
if (shutting_down && q_jobs.empty())
return;
th = q_jobs.front();
q_jobs.pop();
}
if (th == nullptr)
return;
/* call sunflower function (execute the job) */
th->get_done() = true;
if (th->get_is_closed())
{
size_t id = th->get_id();
{
std::lock_guard<std::mutex> lock(cre_mutex);
delete v_handles[id];
v_handles[id] = nullptr;
idx_avl.push_back(id);
}
}
/* local cleanup */
}
}
Key point: the worker uses std::unique_lock + condition_variable::wait(predicate) to block efficiently until work arrives or shutdown is requested. This avoids busy-waiting and wasted CPU cycles.
static Vec<ThreadHandle *> v_handles; /* active handles */
static Vec<size_t> idx_avl; /* free indices */
static std::mutex cre_mutex; /* protect create/close operations */
static std::mutex wl_mutex; /* protect worker loop queue access */
static std::mutex thr_mutex; /* protect pushing jobs into queue */
static std::condition_variable t_cv;
static std::queue<ThreadHandle *> q_jobs;
static std::vector<std::thread> v_workers;
static bool shutting_down = false;
ThreadHandle (summary)
class ThreadHandle
{
private:
Object *name = nullptr;
Object *args = nullptr;
Module *mod = nullptr;
bool done = false;
bool is_closed = false;
size_t id = 0;
public:
ThreadHandle(Object *_Name, Object *_Args, Module *_Mod)
: name{_Name}, args{_Args}, mod{_Mod}
{
IR(name);
IR(args);
}
Module *&get_mod();
Object *&get_name();
Object *&get_args();
inline bool &get_done();
inline size_t &get_id();
inline bool &get_is_closed();
void run();
~ThreadHandle()
{
DR(name);
DR(args);
}
};
(Where IR/DR are the Sunflower runtime reference management macros.)
void
ThreadHandle::run()
{
{
std::lock_guard<std::mutex> lock(thr_mutex);
q_jobs.push(this);
}
t_cv.notify_one();
}
run() enqueues the ThreadHandle and signals a worker to wake and pick it up.
_Native_Thread API
create
Creates a ThreadHandle and stores it in v_handles, reusing indices from idx_avl when possible:
SF_API Object *
create(Module *mod)
{
std::lock_guard<std::mutex> lock(cre_mutex);
Object *o_fname = mod->get_variable("fname");
Object *o_fargs = mod->get_variable("fargs");
assert(o_fname->get_type() == ObjectType::FuncObject);
assert(o_fargs->get_type() == ObjectType::ArrayObj);
ThreadHandle *th = new ThreadHandle(
o_fname, o_fargs,
static_cast<FunctionObject *>(o_fname)->get_v()->get_parent());
size_t idx;
if (idx_avl.get_size()) {
size_t p = idx_avl.pop_back();
v_handles[p] = th;
idx = p;
} else {
idx = v_handles.get_size();
v_handles.push_back(th);
}
th->get_id() = idx;
Object *ret = static_cast<Object *>(new ConstantObject(
static_cast<Constant *>(new IntegerConstant(static_cast<int>(idx)))));
IR(ret);
return ret;
}
run
Starts a previously created thread by enqueueing its handle:
SF_API Object *
run(Module *mod)
{
Object *o_id = mod->get_variable("id");
assert(OBJ_IS_INT(o_id));
size_t id = static_cast<size_t>(
static_cast<IntegerConstant *>(
static_cast<ConstantObject *>(o_id)->get_c().get())->get_value());
assert(id < v_handles.get_size());
v_handles[id]->run();
Object *ret = static_cast<Object *>(new ConstantObject(
static_cast<Constant *>(new NoneConstant())));
IR(ret);
return ret;
}
join
Waits until the single thread has finished. (This implementation busy-waits on the done flag; it’s a simple workaround given detached execution by default.)
SF_API Object *
join(Module *mod)
{
Object *o_id = mod->get_variable("id");
assert(OBJ_IS_INT(o_id));
size_t id = static_cast<size_t>(
static_cast<IntegerConstant *>(
static_cast<ConstantObject *>(o_id)->get_c().get())->get_value());
assert(id < v_handles.get_size());
ThreadHandle *&th = v_handles[id];
while (!th->get_done())
;
Object *ret = static_cast<Object *>(new ConstantObject(
static_cast<Constant *>(new NoneConstant())));
IR(ret);
return ret;
}
Note: The busy-wait above is functional but not optimal. A better approach is to use a per-handle condition_variable or future/promise so join() can block efficiently without spinning.
join_all
Waits for all outstanding threads:
SF_API Object *
join_all(Module *mod)
{
for (ThreadHandle *&th : v_handles) {
if (th == nullptr) continue;
while (!th->get_done())
;
}
Object *ret = static_cast<Object *>(new ConstantObject(
static_cast<Constant *>(new NoneConstant())));
IR(ret);
return ret;
}
close
Closes a handle, deleting it if finished or marking it for deletion (detached) if it’s still running:
SF_API Object *
close(Module *mod)
{
std::lock_guard<std::mutex> close(cre_mutex);
Object *o_id = mod->get_variable("id");
assert(OBJ_IS_INT(o_id));
size_t id = static_cast<size_t>(
static_cast<IntegerConstant *>(
static_cast<ConstantObject *>(o_id)->get_c().get())->get_value());
ThreadHandle *&th = v_handles[id];
if (id < v_handles.get_size() && th != nullptr) {
if (th->get_done() && !th->get_is_closed()) {
delete th;
th = nullptr;
idx_avl.push_back(id);
} else if (!th->get_done()) {
th->get_is_closed() = true; /* detach: delete when done */
}
}
Object *ret = static_cast<Object *>(new ConstantObject(
static_cast<Constant *>(new NoneConstant())));
IR(ret);
return ret;
}
Sunflower intentionally does not provide a hard-kill thread primitive, since forcibly killing threads can cause severe undefined behavior.
Object-oriented wrapper (Sunflower pseudo-code)This is the high-level thread module wrapper that a Sunflower program would use:
import '_Native_Thread' as client
class Thread
id = -1
fun _init (self, fn, args)
self.id = client.create(fn, args)
fun run (self)
if self.id == -1
return? "Thread has not been initialized or is closed"
client.run(self.id)
fun join (self)
if self.id == -1
return? "Thread has not been initialized or is closed"
return client.join(self.id)
fun _kill (self)
if self.id != -1
client.close(self.id)
fun _init (fn, args)
return Thread(fn, args)
fun join_all ()
client.join_all()
Observations and next steps
The worker pool model avoids creating thousands of OS threads and keeps resource usage bounded by M.
Using a condition_variable prevents busy-waiting and reduces CPU waste.
Current join() and join_all() implementations use busy-waiting on a done flag. Replacing these with synchronization primitives (per-handle condition_variable, std::promise/std::future, or similar) would improve efficiency and responsiveness.
Return value handling from thread functions is intentionally minimal to avoid complex ownership and collection semantics; designing a robust mechanism for returning values (futures/promises or message passing) is a clear next enhancement.
Closing
This is the first stable threading implementation for Sunflower, a pragmatic, C++-based runtime using a fixed worker pool, condition variables, and careful resource management. There’s room for improvement (better join semantics, richer scheduling, more advanced task stealing, etc.), but the current design addresses the main pain points I encountered early on.
I’ll be documenting more of Sunflower’s language design and runtime internals in a short series on my blog. I’d love to hear feedback, suggestions, or pointers to good resources on language runtime design and concurrency. Cheers!
This portfolio is free software in spirit and design. You are welcome here.
OVERVIEWThis website serves as my personal portfolio and technical space. It is a place to explore my work, read my thoughts, and engage through discussion and collaboration.
You can:
• View my projects
• Read my blogs
• Contact me directly
• Create an account
• Comment on posts
• Publish blog posts (for registered users)
FEATURESPROJECTS
Browse a curated collection of my work, including software projects, experiments, and long-term builds. Each project includes context, design goals, and implementation details where relevant.
BLOGS
Read technical articles, design notes, and explorations of ideas. Topics may include programming, systems, tooling, architecture, and engineering thought processes.
USER ACCOUNTS
Users may create an account to participate more actively.
Authenticated users can:
Leave comments
Engage in discussions
Publish blog posts
COMMENTS & DISCUSSION
Thoughtful discussion is encouraged. Comments exist to exchange ideas, ask questions, and provide constructive feedback.
CONTACT
If you want to reach out directly, whether for collaboration, questions, or opportunities: use the contact section.
PHILOSOPHYThis portfolio values:
• Clarity over noise
• Engineering over hype
• Learning through building
• Open discussion and shared knowledge
The design and content aim to be practical, readable, and honest.
USAGEYou do not need an account to browse projects or read blogs. An account is only required for interaction such as commenting or posting.
Respectful use is expected.
STATUSThis portfolio is actively maintained. Content and features may evolve over time.
THANK YOUThank you for visiting. Explore freely, read carefully, and feel welcome to engage.