Consider the following two spark dataframes:
df1.show()
+----+------+-------+
|id_a|time_a|value_a|
+----+------+-------+
| 1| 1| CA|
| 1| 2| CA|
| 2| 1| TX|
| 3| 5| NE|
| 4| 6| WA|
+----+------+-------+
df2.show()
+----+------+-----------+
|id_b|time_b| value_b|
+----+------+-----------+
| 1| 1| San Jose|
| 2| 1|Los Angeles|
| 2| 2| Austin|
+----+------+-----------+
Now assume, you want to join the two dataframe using both
id
columns and time
columns. This can easily be done in pyspark:df = df1.join(df2,(df1.id==df2.id_b)&(df1.time==df2.time),joinType="inner")
df.show()
+----+------+-------+----+------+-----------+
|id_a|time_a|value_a|id_b|time_b| value_b|
+----+------+-------+----+------+-----------+
| 1| 1| CA| 1| 1| San Jose|
| 2| 1| TX| 2| 1|Los Angeles|
+----+------+-------+----+------+-----------+
Note that parentheses around the conditions is absolutely necessary.